AD-A213  678 


Navy  Personnel  Research  and  Development  Center 

San  Diego.  CA  92162-6800  TR  89-13  July  1989 


Officer  Career  Development: 
Analytic  Strategy  Recommendations 


NPRDC  TR  S9-!  3 


luly  ] 9S9 


Officer  Career  Development:  Analytic  Strategy  Recommendations 


Lawrence  R.  .lames 
Christopher  K.  Hertzog 
Georgia  Institute  of  Technology 


Reviewed  by 
D obert  F.  Morrison 


Approved  by 
lohn  1.  Pass 

Director,  Personnel  Systems  Department 


Released  by 
B.  E.  Bacon 
Captain,  U.S.  Navy 
Commanding  Officer 

and 

lames  5.  McMichae! 
Technical  Director 


Approved  for  public  release; 
distribution  is  unlimited. 


Navy  Personnel  Research  and  Development  Center 
San  Diego,  C alifornia  921 52-6S00 


DISCLAIMER  NOTICE 


THIS  DOCUMENT  IS  BEST 
QUALITY  AVAILABLE.  THE  COPY  . 
FURNISHED  TO  DTIC  CONTAINED 
A  SIGNIFICANT  NUMBER  OF 
PAGES  WHICH  DO  NOT 
REPRODUCE  LEGIBLY. 


REPORT  DOCUMENTATION  PAGE  | 

la  REPORT  SECURITY  CLASSIFICATION 

lb  RESTRICTIVE  MARKINGS  "  ” 

1  2a  SECURITY  CLASSIFICATION  AUTHORITY 

3  DISTRIBUTION/ A VAILA8ILITY  Of  REPORT 

Approved  for  public  release;  distribution  is 
unlimited. 

2d  DEC'.ASSiFiCATiON/ DOWNGRADING  SCHEDULE 

4  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

5  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 

A  1 

NPRDC  TR  89-13 

TCN  87-62 1 

6a  NAME  OF  PERFORMING  ORGANIZATION 

School  of  Psychology 

Georgia  Institute  of  Technology 

6b 

OFFICE  SYMBOL 
(if  sppliceble) 

7a.  NAME  OF  MONITORING  ORGANIZATION 

U.5'.  Army  Research  Office 

6c  ADDRESS  (oty.  xate,  and  HP  Code) 

7b  ADDRESS  (Ory.  State,  and  HP  Code) 

Atlanta,  GA  30332 

8a  NAME  Of  FUNDING /SPONSORING 

bBfv^’Pl^s'^nnel  Research  and 


9  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 


8c  ADDRESS  (Cify.  Stare  end  ZIP  Code) 

10  SOURCE  OF  FUNDING  NUMBERS 

San  Diego,  CA  92132-6800 

PROGRAM 

ELE»^3N 

PROJECT 

NO 

task 

NO 

WORK  UN'T 

1  438^$?^° 

'  ‘  TiTlE  (include  Security  Oeuificevon) 

Officer  Career  Development:  Analytic  Strategy  Recommendations 


•i  PERSONA.  AuThOR(S) 

Lawrence  R.  James  and  Christopher  K.  Hertzog 


’3a  type  of  report 
Final 


13b  TIME  COVERED  Ii4  DATE  OF  REPORT  (Yesr,  Month,  Dey)  hs  PAGE  COUNT 

from  Sep  S7  to  Mar  88  |  1989  July  I  116 


COSA1-  CODES 


SUB-GROU0 


18  SUB;ECT  terms  (Continue  on  reve  ■<  .teceuary  *nd  identify  by  block  number) 
Analytic  strategy,  latent  varmbir  time  series,  cohort  analysis, 
moderator  analysis  , 


19  ABSTRACT  (Continue  on  reverie  if  necesury  end  identify  by  block  number) 

‘  ^-Strategies  are  recommended  for  analyzing  information  from  the  data  bank  developed  by  the 
Personnel  Distribution  and  Career  Development  (PDCD)  work  unit  for  the  purpose  of  establishing 
empirically-based  decision  guides  to  assist  in  the  design  and  implementation  of  career  policy  and 
practice  in  the  U.S.  Navy.  A  set  of  analytic  models  is  proposed  wherein  each  model  addresses  an 
important  issue  concerning  the  development  of  empirically-based  decision  guides  for  career  develop¬ 
ment.  The  statistical  assumptions  underlying  each  model  are  reviewed,  as  are  methods  that  may  be  used 
to  reasonably  satisfy  these  assumptions.  Estimation  techniques  and  procedures  for  avoiding  common 
errors  in  estimation  also  receive  attention.  1  . 


20  Distribution /availability  of  abstract 
BuNCLASSiFIED/UNL'MITED  □  SAME  AS  RPT 

□  otic  users 

22a  NAME  OF  RESPONSIBLE  INDIVIDUAL 

Robert  Morrison 

21  ^icA^mCLASS,FiCATtON 


22c 


OFFICE  SYMBOL 

Code  12 


DO  FORM  1473, 84  mar 


83  APR  editio n  may  be  used  until  exhausted 
All  other  editions  are  obsolete 


SECURITY  CLASSIFICATION  Of  THIS  PAGE 

UNCLASSIFIED 


FOREWORD 


This  report  focuses  on  a  review  of  the  data  bank  developed  with  the  Personnel 
Distribution  and  Career  Development  (PDCD)  work  unit  data  base  and  the  proposed 
analytical  strategy.  It  describes  problems  inherent  in  the  data  and  recommends 
techniques  and  strategies  to  overcome  them. 

This  is  the  second  of  two  reports  completed  with  TCN  87-621  with  Robert  F. 
Morrison  as  the  contracting  officer's  technical  representative.  The  TCN  was  conducted 
within  exploratory  development  (Program  Element  06Q2233N,  work  unit  number 
14SSWX4B529,  Personnel  Distribution  and  Career  Development)  under  the  sponsorship  of 
the  Chief  of  Naval  Research  (ONR  222).  This  report  is  the  fifteenth  published  within 
PDCD  and  is  intended  for  use  in  the  PDCD  work  unit. 


B.  E.  BACON  JAMES  S.  McMICHAEL 

Captain,  U.S.  Navy  Technical  Director 

Commanding  Officer 

Prior  PDCD  Publications 

1.  Cook,  T.  M.,  &  Morrison,  R.  G.  (1982,  August).  Surface  warfare  junior  officer 
retention:  Early  career  development  factors  (NPRDC  TR  82-59).  San  Diego:  Navy 
Personnel  Research  and  Development  Center. 

2.  Cook,  T.  M.,  &  Morrison,  R.  F.  (1983,  January).  Surface  warfare  junior  officers 
retention:  Background  and  first  sea  tour  factors  as  predictors  of  continuance  beyond 
obligated  service  (NPRDC  TR  83-6).  San  Diego:  Navy  Personnel  Research  and 
Development  Center. 

3.  Morrison,  R.  F.  (1983,  July).  Officer  career  development:  Surface  warfare  officer 
intervie >vs  (NPRDC  TN  83-10.  San  Diego:  Navy  Personnel  Research  and  Development 
Center. 

4.  Morrison,  R.  F.,  Martinez,  C.,  &  Townsend,  F.  W.  (1984,  March).  Officer  career 
development:  Description  of  aviation  assignment  decisions  in  the  antisubmarine  warfare 
(A5W)  patrol  community  (NPRDC  TR  84-31).  San  Diego:  Navy  Personnel  Research  and 
Development  Center. 

5.  University  of  San  Diego  (1984,  October  23-25).  Proceedings:  Volume  1.  Group 
reports.  Tri-service  career  research  workshop.  San  Diego:  Continuing  Education, 
University  of  San  Diego.  (Author) 

6.  Morrison,  R.  F.,  &  Cook,  T.  M,  (1985,  February).  Military  officer  career  development 
and  decision  making:  A  multiple-cohort  longitudinal  analysis  of  the  first  24  years 
(NPRDC  MPL  TN  85-4).  San  Diego:  Navy  Personnel  Research  and  Development  Center, 
Manpower  and  Personnel  Laboratory. 

7.  Wilcove,  G.  L.,  Bruni,  J.  R.,  <5c  Morrison,  R.  F.  (1987,  August).  Officer  career 
development:  Reactions  of  two  unrestricted  line  communities  to  detailers  (NPRDC  TN 
87-40).  San  Diego:  Navy  Personnel  Research  and  Development  Center. 


v 


S.  Morrison,  R.  F.  (19S8,  March).  Officer  career  development:  URL  officers  in  joint- 
duty  assignments  (NPRDC  TN  SS-26).  San  Diego:  Navy  Personnel  Research  and 
Developinen l  Center. 

9.  Wilcove,  G.  L.  (Ed.)  (198S,  August).  Officer  career  development:  Problems  of  three 
unrestricted  line  communities  (NPRDC  TR  88-26).  San  Diego:  Navy  Personnel  Research 
and  Development  Center. 

10.  Wilcove,  G.  L.  (19SS,  September).  Officer  career  development:  General  unrestricted 
line  officer  perceptions  of  the  dual-career  track  (NPRDC  TN  SS-62).  San  Diego:  Navy 
Personnel  Research  and  Development  Center. 

11.  Bruni,  0.  R.,  <5c  Wilcove.  G.  W.  (1988,  October).  Officer  career  development: 
Preliminary  surface  warfare  officer  perceptions  of  a  major  career  path  change  (NPRDC 
TN  S9-5).  San  Diego:  Navy  Personnel  Research  and  Development  Center. 

12.  Bruce,  R.  (1989,  dune).  Officer  career  development:  Fleet  perceptions  of  the 
aviation  duty  officer  program  (NPRDC  TN  S9-2‘>).  San  Diego:  Navy  Personnel  Research 
and  Development  Center. 

13.  Bruce,  R.,  A'  Burch,  R.  (1989,  dune).  Officer  career  development:  Modeling  married 
aviator  retention  (NPRDC  TR  89-11).  San  Diego:  Navy  Personnel  Research  and 
Development  Center. 

14.  dames,  L.  R.,  A  Hertzog,  C.  K.  (in  review).  Officer  career  development:  An 
overview  of  analytic  concerns  for  the  research.  San  Diego:  Navy  Personnel  Research  and 
Development  Center. 


v  i 


SUMMARY 


Problem 

A  large  data  bank  has  been  developed  by  the  Personnel  Distribution  and  Career 
Development  (PDCD)  for  the  purpose  of  establishing  empirically- based  decision  guides  to 
assist  in  the  design  and  implementation  of  career  policy  and  practice  in  the  U.S.  Maw. 
Data  banks  of  this  magnitude  often  engender  special  methodological  problems  during 
analysis. 

Purpose 

To  recommend  analytic  strategies  that  consider  not  only  the  special  methodological 
problems  that  might  arise  in  the  analysis  of  the  large  data  bank  but  also  the  need  to 
develop  effective  and  practical  models  for  explaining  and  forecasting  continuance  in  the 
Navv,  occupational  development,  and  upward  mobility  in  the  Navy. 

Approach 

Analytic  strategies  are  recommended  to  test  causal  models  for  continuance,  occupa¬ 
tional  development,  and  upward  mobility.  The  strategies  involve  consideration  of  ( 1)  the 
tvpes  of  analytic  models  that  could  be  employed  to  conduct  statistical  analvses  on  the 
data:  (?)  the  conceptual  and  statistical  requirements  or  assumptions  for  each  analytic 
model,  with  accompanying  discussion  of  practical  means  bv  which  assumptions  might  be 
"reasonably  satisfied";  (3)  actual  statistical  estimation  procedures:  and  (M  likelv 
specification  errors,  which  refer  to  problems  in  estimation  and  attempts  to  fit  models 
that  occur  often  in  practice. 

Emphasis  is  placed  on  practical  models  and  designs  that  provide  straightforward 
means  for  testing  causal  models.  However,  more  sophisticated  statistical  strategies  are 
reviewed  in  the  latter  part  of  the  report.  Such  strategies  mav  be  useful  for  analyses 
designed  for  more  scientifically  oriented  audiences. 
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INTRODUCTION 


The  objective  of  this  second  of  two  reports  is  to  recommend  analytic  strategies  to 
test  causa!  models  for  three  kev  career  outcome  variables,  namely  continuance  within  the 
Naw,  occupational  development,  and  upward  mobility  within  the  Maw.  This  report 
augments  the  first  report  (Report  0,  which  reviewed  the  data  bank  developed  bv  the 
Personnel  Distribution  and  Career  Development  (PDCn)  work  unit  in  the  conduct  of 
research  designed  to  assist  in  the  design  and  implementation  of  career  policy  and  practice 
in  the  Navy.  The  first  report  also  considered  basic  concerns  pertaining  to  analytic 
strategies  for  testing  career  development  models.  The  purpose  of  this  report  is  to  furnish 
greater  breadth  and  depth  in  regard  to  analytic  strategies  by  considering  (1)  types  of 
analytic  models  that  could  be  employed  to  conduct  statistical  analyses  on  the  data:  (21  the 
conceptual  and  statistical  requirements  or  assumptions  for  each  analytic  model,  with 
accompanying  discussion  of  practical  means  by  which  assumptions  might  he  "reasonably 
satisfied";  (3)  actual  statistical  estimation  procedures;  and  (4)  likely  specification  errors, 
which  refer  to  problems  in  estimation  and  attempts  to  fit  models  that  occur  often  in 
practice. 

It  is  recognized  that  the  PDCD  work  unit  has  already  devoted  considerable  time  and 
effort  to  analytic  concerns,  including  major  scaling  efforts  on  the  19R2  and  1986  waves  of 
data  and  development  of  exploratory  models  for  the  outcome  variables.  It  is  also 
recognized  that  the  analytic  strategies  of  paramount  importance  at  the  present  time  are 
those  that  will  provide  the  Navy  with  effective  yet  practical  models  for  explaining  and 
forecasting  continuance,  occupational  development,  and  upward  mobility.  Consequently, 
we  will  focus  on  observed  or  "manifest"  variables  designs  and  both  analytic  models  and 
statistical  strategies  that  provide  straightforward  and  practical  means  for  testing  causal 
models.  We  will  address  the  use  of  more  sophisticated  analytic  models  and  statistical 
strategies  (e.g.,  latent  variable  models)  at  the  conclusion  of  this  report.  It  is  hoped  that 
these  discussions  will  oe  useful  for  analyses  designed  for  more  scientifically  oriented 
audiences. 

This  report  is  presented  in  four  sections  that  correspond  to  the  natural  seouencing  of 
analyses  (Skinner,  plus  a  short  summary.  Section  I  addresses  seal'5  development. 

We  shall  concentrate  on  potential  problems  with  the  use  of  developed  scales  in  the 
proposed  confirmatory  (casual)  analyses.  Section  II  pertains  to  analytic  strategies  that 
may  be  used  to  test  manifest  variable  causal  models  within  subgroups  defined  bv  salient 
moderators,  such  as  community  and  career  stage.  Models,  assumptions,  statistical 
techniques,  and  likely  specification  errors  are  considered.  Section  III  is  devoted  to 
analytic  strategies  for  comparing  the  casual  models  developed  in  the  Section  II  analyses 
among  two  or  more  subgroups.  These  are  moderator  or  homogeneity  of  regression 
analyses,  and  models,  assumptions,  statistical  techniques,  and  likely  specification  errors 
are  again  considered.  Section  IV  is  devoted  to  brief  discussions  of  more  sophisticated 
techniques,  including  latent  variable  confirmatory  analysis,  event-history  analysis,  and 
logit  analysis.  Section  V  presents  a  brief  summary  of  key  recommendations  for  future 
research. 

It  is  noteworthy  that  this  report  is  designed  to  present  an  overview  of  analytic 
strategies,  with  special  emphasis  on  assumptions  and  potential  specification  errors.  We 
relied  heavily  on  the  published  literature  from  various  statistical  areas.  However,  we  will 
be  happy  to  extend  and  elaborate  on  special  topics  in  this  report,  as  requited  by  the 
PDCD  work  unit. 
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SECTION  I:  SCALE  DEVELOPMENT 


The  decision  was  made  bv  the  POCD  work  unit  to  focus  initial  research  efforts  on 
scales  that  are  common  to  the  two  waves  of  data  (i.e.,  the  1982  wave  and  the  198A  wave). 
Inspection  of  these  (common)  scales  indicates  that  (1)  internal  consistency  estimates  of 
reliabilities,  based  on  coefficient  alpha,  tend  to  be  greater  than  or  equal  to  .75  even 
though  many  of  the  scales  (item  composites)  have  only  a  few  items  (i.e.,  three  to  five 
items),  and  (2)  the  items  comprising  a  particular  scale  tend,  bv  rational  examination,  to  be 
assessing  a  common  construct.  While  much  potentially  remains  to  be  done  regarding  tests 
of  the  psvchometric  properties  of  the  data,  generally  moderate  to  high  reliabilities  and 
scales  that  make  rational  sense  are  good  starting  points,  especially  for  the  practical 
analyses  of  primary  concern  here. 

We  have  one  princiDle  concern  for  these  practical  analyses.  This  concern  derives 
from  the  fact  that  a  large  number  of  manifest  causal  variables  (scales)  may  be  relevant  to 
a  particular  causal  model  and  thus  entered  into  a  confirmatory  analysis  for  that  model.  In 
Report  1,  we  noted  that  use  of  a  potentially  large  number  of  manifest  scales  in  a 
confirmatory  analvsis  increases  the  probability  of  multicollinearitv  (Gordon,  196^). 
Products  of  multicollinearitv  include  large  standard  errors  for  ordinary  least-squares 
(OLS)  coefficients  (regression  weights),  which  spuriously  detracts  from  findings  of 
significant  relations,  and  instability  in  the  OLS  estimates  themselves  (cf.  (Johnston,  19<?aL 
We  recommended  use  of  latent  variable  designs  as  a  possible  solution  to  the  potential 
multicollinearitv  problem.  However,  given  the  decision  to  proceed  initially  with  manifest 
variable  designs,  alternatives  are  needed.  We  suggest  the  following  procedures. 

1.  Correlations  among  causal  variables  entering  into  a  particular  equation  for  OLS 
analyses  or  an  overall  model  for  LISRFL  analyses  need  to  be  examined.  A  "verv  high 
correlation"  (e.g.,  >  .75)  suggests  the  possibility  of  an  ensuing  multicollinearitv  condition 
in  th<»  regression/LISREL  analysis. 

2.  Examination  of  bivariate  correlations  is  often  not  sufficient  to  identify  potential 
multicollinearity  conditions  because  no  one  bivariate  correlation  is  verv  high.  However, 
one  or  more  causal  variables  may  be  linearly  dependent  on  some  subset  of  the  remaining 
causal  variables,  which  does  create  a  multicollinearitv  condition.  Check:  fo-  linear 
dependence  may  be  made  by  regressing  each  causal  variable  in  a  causal  svstem  (e.g., 
causal  or  structural  equation)  on  the  other  causal  variables  in  that  svstem  (i.e.,  each  of  K 
causal  variables  is  regressed  on  the  remaining  K-I  causal  variables).  If  the  squared 
multiple  correlation  (i.e.,  or  SMC)  for  a  particular  variable  is  high,  then  this  variable 
may  be  linearly  dependent  on  the  other  variables  in  the  system  and  inclusion  of  this 
variable  in  analyses  may  create  a  multicollinearity  condition.  (Common  factor  analvsis 
programs  often  furnish  the  R^s  of  interest  here  inasmuch  as  R?s--SMCs--are  often  used 
as  initial  estimates  of  communalities. 

3.  Results  of  confirmatory  analyses  should  be  checked  carefully  for  indications  of 
multicollinearitv,  or  "near  multicol'inearitv"  ((Johnston,  198^,  p.  2^5).  Very  large  standard 
errors  for  regression  coefficients,  estimated  regression  coefficients  that  change  with 
small  changes  in  the  data  (e.g.,  random  addition  or  deletion  of  a  small  number  of  cases,  a 
large  R?  with  few  significant  regression  coefficients,  and  a  pattern  of  bivariate 
correlations  are  indicative  of  multicollinearity  and  near  multicollinearitv  problems. 
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4.  There  exist  numerous  remedies  to  the  (near!  multicollinearitv  problem  (see 
Uonnston,  1954,  pp.  250-2591.  The  most  direct  and  practical  remedies  are: 

a.  Delete  some  causal  variables  from  a  set  of  highiv  correlated  causa! 
variables.  For  example,  if  one  has  three  causal  variables  that  inter  correlate  .90,  then 
drop  two  of  the  variables. 

b.  Form  a  composite  of  highly  correlated  causal  variables.  This  alternative 
accomplishes  some  of  the  same  objectives  as  a  latent  variable  approach,  given  that  the 
manifest  variables  to  be  combined  are  measures  of  the  same  construct.  We  recommend 
that  only  indicators  of  the  same  construct  be  combined.  Theory,  substantive  content  of 
variables,  bivariate  correlations,  and  perhaps  a  factor  analysis  could  be  employed  to 
ascertain  whether  variables  are  measures  of  the  same  construct.  We  might  also  note  that 
we  prefer  this  alternative  to  the  deletion  of  variables,  a  key  reason  being  that  reliabilities 
of  the  variables  used  in  analyses  are  likely  to  be  enhanced  by  forming  composites. 

c.  I'se  block-recursive  forms  of  analyses  (cf.  Mamboodiri,  Carter,  <v  Blalock, 
197  5,  pp.  526-5301.  Block  recursive  analysis  is  similar  to  regression  analyses  based  on  sets 
of  independent  variables  (cf.  Cohen  <5c  Cohen,  1953,  Chapt.  4),  and  is  often  applied  in 
complex  designs.  Sets  of  theoretically  related  variables  are  identified  and  grouped  into 
blocks  of  variables  (e.g.,  environmental,  career  counseling,  motivation,  affect,  etc.l.  A 
causal  model  is  then  constructed  for,  in  this  case,  a  single  dependent  or  "endogenous" 
variable  (e.g.,  career  intentl,  but  the  causal  mechanisms  are  represented  hv  blocks  of 
variables  rather  than  by  single  variables.  Analysis  then  proceeds  by  introducing  one  block 
of  variables  at  a  time  into  an  OT_s  equation- -that  is,  a  hierarchical  regression  analysis 
(see  Cohen  <V  Cohen,  1953,  Chapt.  4 1.  F0r  each  block  of  variables,  onlv  the  change  in  the 
R  ?  is  interpreted  (i.e.,  the  degree  to  which  introduction  of  this  set  of  variables  enhanced 
prediction!.  Mo  attempt  is  made  to  interpret  the  regression  weights  for  individual 
variables  (within  blocksl  because  of  the  likelihood  of  multicollinearitv. 

In  sum,  we  suggest  the  judicious  use  of  alternative  "b"  (forming  combinations!  when 
combinations  of  variables  are  clearly  indicated,  followed  bv  the  use  of  block  recursive 
models  if  multicollinearitv  still  appears  to  be  a  problem,  which  is  quite  possible  in 
complex  designs  involving  many  caused  variables.  Later,  in  hection  IV,  we  shall  address 
additional  scaling  issues.  Of  special  concern  is  the  use  of  latent  variable  models  to 
compare  factor  structures  (measurement  models!  over  subgroups  defined  bv  kev 
moderator-  variables  such  as  career  stage  and  cohort. 


SECTION  n.  ANALYTIC  STRATEGIES  FOR  INITIAL 
TESTS  OF  CAUSAL  MODELS 

The  general  model  of  career  development  proposed  bv  Morrison  and  Cook  ( 195  5)  and 
reviewed  in  Report  1  suggests  that  it  is  unlikely  that  a  single  causal  model  will  suffice  to 
explain  all  continuance  decisions  (or  all  decisions  pertaining  to  either  occupational 
development  or  upward  mobility!.  Rather,  a  series  of  moderators  likely  bound  or  limit  the 
generalizability  of  a  particular  causal  model  to  an  identifiable  subset  of  the  data  (i.e.,  a 
subgroup).  Three  potentially  salient  sources  of  moderation  are:  (I)  community  (SWO, 
A  WO  URL(Gl)  as  well  as  subcommunities  within  communities  (e.g.,  AWn.p  and  AWO- 
MFO):  (2)  career  stage,  which  refers  to  key  career  choice  points  (Morrison,  1953)  and  was 
illustrated  in  terms  of  "social  cohorts"  (Morrison  ir  Cook,  1955!  in  Report  I  (see  o.  1  D, 
and  (3)  generational  differences,  which  refers  to  basic  differences  among  the  members  of 
different  cohorts.  Mote  that  career  stage  refers  to  a  form  of  sequential  moderation 
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wherein  causal  models  for  career  decisions  differ  for  the  same  individuals  over  time 
(Ghiselli,  19 5F;  names,  noe,  5c  Irons,  1982:  names  5c  Tetrick,  1984),  whereas  generational 
differences  refers  to  variations  in  causal  models  for  different  groups  of  individuals 
defined  by  year  of  commissioning. 

It  is  anticipated  that  the  PHCH  work  unit  will  combine  career  development  theory, 
knowledge  of  Navy  practices,  and  empirical  data  to  define  meaningful  subgroups  for 
analyses.  (If  possible,  please  note  our  recommendation  in  Report  1  to  avoid  clustering  bv 
empirical  similarity  using  profile  analytic  techniques.)  We  devote  Section  II  of  this  report 
to  analytic  strategies  lor  initial  tests  of  causal  models  within  the  subgroups  so  defined  bv 
the  PDCD  unit.  Section  III  addresses  comparisons  of  models  among  subgroups--that  is, 
moderator  analvses.  Statistical  recommendations  are  made  in  Section  II  that  will  prepare 
the  data  and  initial  results  for  the  moderator  analyses  proposed  in  Section  III. 

Analytic  Models 


We  begin  bv  brieflv  reviewing  the  tvpes  of  manifest  variable  analytic  models  that 
potentially  could  be  applied  to  the  Navy  career  development  data  to  answer  salient, 
practical  nroblems.  As  presented  in  neport  1,  these  analytic  models  include: 

1.  Cross-sectional  models  (Figure  la):  Che  key  to  these  models  is  that  all  data 
were  collected  at  aoproximatelv  the  same  time  for  a  particular  individual.  An  example  is 
a  model  developed  for  the  1982  wave  (or  the  198F  wave)  data  to  explain  career  intent  for 
officers  in  the  SWO  community  who  have  been  in  the  Navv  for  18  to  39  months. 

2.  Longitudinal  model  (figure  lb):  As  apDlied  to  this  studv,  a  longitudinal  model  is 
typically  one  in  which  the  data  on  causal  variables  are  collected  cross-sectionallv  bv 
questionnaire,  but  data  on  the  kev  endogenous  (criterion,  dependent)  variable  is  collected 
at  a  later  date.  An  obvious  examnle  is  a  combination  of  the  cross-sectional  model 
illustrated  above  with  data  on  continuance  (retention’'  collected  on  a  longitudinal  basis. 
Additional  illustrations  of  this  form  of  model  are  presented  in  Figures  2  and  3  of  Report  1. 

3.  Nonlagged,  cross-sectional  time  series  (Figure  lc):  As  shown  in  Figure  4  of 
Report  1  and  as  discussed  on  pages  15  and  IF  of  that  report,  this  form  of  analytic  model 
requires  that  repeated  measurements  by  taken  on  multiple  individuals  at  two  or  more 
points  in  time  and  (a)  all  causal  effects  take  place  within  specified  time  intervals  and  (b) 
there  are  no  lagged  causal  effects  from  one  time  interval  to  the  next  time  interval  (cf. 
Nerlove,  1971:  Hannan  5r  Young,  1977;  Johnston,  1984).  It  is  unlikely  that  this  model  will 
receive  much  attention  in  the  career  development  research  because  of  the  number  of 
hypothesized  lagged  effects  in  the  Morrison  and  Cook  (1985)  career  development  model. 

4.  Lagged  cross-sectional  time  series  (Figure  Id.):  The  lagged  form  of  cross- 
sectional  time  series  is  again  based  on  repeated  measures  from  multiple  individuals  over 
time.  Here,  however,  variables  measured  at  one  point  in  time  (e.g.,  1982)  are  causes  of 
variables  measured  at  another  point  in  time  (e.g.,  198F).  When  an  endogenous  variable 
such  as  career  intent  is  viewed  as  a  cause  of  itself  over  time  (see  Figure  Id  and  pages  I  5 
and  I  7  in  Report  1),  then  the  mode!  takes  the  form  of  a  "lagged  endogenous  variable, 
cross-sectional  time  series"  (cf.  James  5r  8ingh,  1978;  Johnston,  1984;  Ostrom,  1978). 
Unfortunately,  with  but  two  waves  of  measurement,  the  model  is  not  a  complete  lagged 
endogenous  variable,  cross-sectional  time  series  because  a  third  wave  of  data  is  needed  to 
test  kev  hypotheses  and  to  effect  what  are  likelv  the  most  appropriate  statistical 
analyses.  Nevertheless,  it  is  expected  that  this  analvtic  mode!  will  be  useful  in  the 
practically  oriented  analvses  of  primary  concern  here.  Consequently,  we  will  devote 
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considerable  attention  to  this  model.  Note  also  the  opportunity  to  add  longitudinally 
measured  endogenous  variables  (e.g.,  continuance  (y^)  to  the  design. 
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>  ire  l*  Potential  analytic  models  for  confirmatory  analyses. 


^  In  sum,  wt  we  four  analytic  models  that,  while  not  exhaustive  of  all  possible 

analytic  models,  will  be  the  key  models  used  to  test  causal  hypotheses  within  subgroups 
defined  by  salient  moderators.  As  noted,  the  nonlagged  cross-sectional  time  series  has 
.  limited  applicability  and  thus  is  not  considered  further  in  this  section  of  the  report.  Fach 

of  the  three  remaining  analytic  models  could  be  employed  to  test  salient  hypotheses  for 
each  of  the  three  criteria.  Cross-sectional  analyses  could  be  conducted  for  the  19<?? 
and/or  the  1986  waves  of  data  (within  subgroups!  for  endogenous  variables  represented  bv 
'decisions"  collected  by  means  of  questionnaires  (see  document  entitled  Outcome  t'ari- 
ables:  Career  Decisions  and  Actions!.  Longitudinal  analyses  could  be  conducted  for  the 
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198?  and/or  1986  waves  for  each  of  the  three  key  endogenous  variables  because  each  such 
variable  has  a  longitudinal  component  (represented  by  "actions"  in  the  document  noted 
above).  Finally,  the  lagged  cross-sectional  time  series  design  is  applicable  for  individuals 
who  have  both  1982  and  1986  questionnaire  data. 

Conceptual  and  Statistical  Requirements  for  Analytic  Models 

We  shall  focus  here  on  general  conditions  that  are  required  to  subject  a  theoretical 
model  to  confirmatory  analysis,  as  discussed  by  James,  Mulaik  and  Brett  (1982),  and  on 
general  statistical  assumptions  required  of  manifest  level  confirmatory  analyses,  with 
additional  attention  to  specific  assumptions  required  for  longitudinal  models  and  lagged 
cross-sectional  time  series.  Statistical  assumptions  that  are  associated  with  specific 
estimation  techniques  are  addressed  later  in  discussions  of  these  techniques. 

The  seven  conditions  pertaining  to  the  appropriateness  of  theoretical  models  for 
confirmatory  analysis  presented  by  James  et  al.  f{982)  are  reproduced  in  Figure  2.  The 
extensive  theoretical  development  and  modeling  that  preceded  development  of  the 
multiple  questionnaires  suggests  reasonable  satisfaction  of  Conditions  1  and  2  (cf. 
Morrison  &  Cook,  1985).  (By  reasonable  satisfaction,  we  refer  to  what  is  scientifically 
acceptable  even  though  imperfect.)  Condition  6 — specification  of  boundaries- -pertains 
primarily  to  the  moderator  analyses  (nonadditivity)  that  are  the  subject  of  the  Section  Ill 
of  this  report.  Condition  ^--specification  of  causal  direction--is,  like  Conditions  1  and  2, 
already  reasonably  satisfied  inasmuch  as  the  career  development  models  to  be  tested  in 
the  initial  analyses  are  "recursive"  (i.e.,  all  causal  relations  are  unidirectional).  Later,  in 
more  scientifically  oriented  analyses,  the  PHCD  work  unit  mav  wish  to  consider  tests  of 
selected  nonrecursive  relations  inasmuch  as  the  Morrison  and  Cook  (1985)  general  model 
of  career  development  presumes  a  number  of  dynamic  relations  (see  James  &  Jones  1980: 
James  <V  Singh,  1978;  James  <V  Tetrick,  1984  for  illustrated  uses  of  nonrecursive  models  in 
psychology).  Monlinearities  in  some  causal  relations,  an  issue  included  in  Condition  6, 
might  also  be  considered  in  these  later  analyses. 

This  leaves  us  with  Conditions  3,  5,  and  7  as  pertinent  to  the  case  at  hand.  We  begin 
with  Condition  7,  which  states  that  structural  (causal)  models  should  be  stable.  Stability 
is  indicated  by  invariance  of  values  of  structural  parameters  over  SDecified  time  intervals, 
which  technically  is  referred  to  as  "stationaritv."  Appropriate  lengths  of  time  intervals 
vary  with  variables  and  models,  but  the  general  idea  is  that  a  time  interval  should  be  of 
sufficient  length  to  allow  for  scientific  inferences  and  generalizations.  On  the  other 
hand,  there  is  no  assumption  that  the  model  or  structural  parameters  are  set  in  concrete. 
That  is,  change  in  the  parameters  is  allowed  over  different  time  intervals,  such  as 
different  career  s*ages.  Indeed,  stability  of  structural  parameters  across  different  time 
intervals  (career  stages)  is  an  empirically  testable  question  if  data  are  available. 

Stationarity  is  testable  using  both  the  cross-sectional  and  longitudinal  models. 
Indeed,  these  will  be  salient  concerns  in  the  moderator  tests  discussed  in  Section  III. 
Stationarity  of  the  lagged  cross-sectional  time  series,  or  lagged  C8TS,  cannot  be  tested 
until  a  third  wave  of  data  are  collected. 

A  point  related  to  both  stability  and  Condition  3  (sDecification  of  causal  order)  is  that 
the  values  on  the  variables  in  the  structural  equations  should  have  reached  a  state  of 
approximate  constancy  before  data  were  collected.  This  assumption  is  referred  to  as  the 
"equilibrium-type  condition"  (cf.  Mamboodiri  et  al.,  197*5)  and  is  predicted  on  the  logic 
that  confirmatory  analysis  is  designed  to  ascertain  if  a  hypothesized  causal  model(s)  could 
have  generated  a  particular  set  of  data.  That  is,  the  causal  processes  are  assumed  to  have 


Condition  I:  Formal  statement  of  theory  in  terms  of  a  structural  (causal)  model. 

Development  of  a  structural  model  that  specifies  variables,  causal  con¬ 
nections  among  variables,  and  functional  relations  and  equations  that 
relate  each  effect  to  all  of  its  relevant  causes. 

Condition  2:  Theoretical  rationale  for  causal  hypotheses. 

Use  of  theory  to  propose  how  causes  produce  effects  by  introduction  of 
mediating  mechanisms  to  help  to  explain  nonobvious  covariation  among 
variables  causal  connections  among  complex  variables. 

Condition  3:  Specification  of  causal  order. 

Hypothesized  order  in  which  variables  occur  naturallv  in  a  causal  model, 
given  an  equilibrium-type  condition  for  cross-sectional  designs  and  speci¬ 
fied  causal  intervals,  stationaritv,  and  an  equilibrium-type  condition  for 
time  series  designs. 

Condition  U:  Specification  of  causal  direction. 

Hypothesized  direction  of  causation  for  each  causal  connection  in  a 
structural  model.  The  direction  may  be  asymmetric,  denoting  a  recursive 
caused  relation,  or  reciprocal,  denoting  a  nonrecursive  causal  relation. 

Condition  5:  Self-contained  causal  equations. 

The  causal  eauation  for  each  effect  (endogenous  variable)  in  a  structural 
model  contains  all  the  relevant  causes  of  that  effect,  which  is  indicated 
by  lack  of  covariation  between  the  explicitly  measured  causes  in  an 
equation  and  the  disturbance  term  of  that  equation. 

Condition  6:  Specification  of  boundaries. 

Given  linearity  in  parameters  and  variables,  the  causal  equations  are 
additive  within  the  populations  (e.g.,  subjects  and  environments)  to  which 
inferences  are  to  be  made. 

Condition  7:  Stability  of  structural  model. 

The  values  of  structural  (causal)  parameters  are  invariant  (stationary) 
over  specified  time  intervals,  and  the  values  on  variables  representing 
events  are  in  an  equilibrium-type  condition. 

Figure  ?.  Conditions  pertaining  to  appropriateness  of  theoretical 
models  for  confirmatory  analysis.  (Adopted  from  3ames, 

Mulaik,  iV  Arett,  19S2,  Figure  2.6,  op.  56-57). 

already  taken  place  and  their  effects  to  have  worked  their  wav  through  the  system  so  that 
the  svstem  is  in  a  state  of  temporarv  equilibrium.  The  confirmatory  analyses  designed  to 
determine  if  a  model (s)  has  a  good  tit  with  the  data  is  thus  essentially  inquiring  whether 
this  modeKs)  could  have  generated  these  data.  To  answer  this  question  requires  first  that 
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the  causal  nrocesses  have  occurred  and  that  the  values  on  the  variables  have  reached  a 
state  of  temporary  constancy--an  equilibrium-type  condition. 

Estimators  of  certain  tvpes  of  stability,  such  as  test-retest  reliability,  Drovide  at 
least  indirect  tests  of  the  equilibrium-type  condition  (indirect  because  thev  require  onlv  a 
correlational  form  of  reliability).  Most  important,  however,  is  the  concern  that 
individuals  should  have  been  in  the  Navv  and  in  their  positions  for  a  sufficient  period  of 
time  to  be  able  to  respond  meaningfully  to  the  questionnaires.  Specifically,  whatever 
causal  influence  is  indicated  by  a  questionnaire  item  should  have  already  occurred.  It  is 
suggested  that  the  PDCFD  work  unit  consider  carefully  whether  all  members  of  the  data 
set  had  been  in  position  for  sufficient  periods  of  time  for  causal  effects  to  have 
stabilized.  A  final  point  in  this  regard  is  the  use  of  the  equilibrium-type  condition  to 
establish  a  causal  order  in  the  cross-sectional  and  longitudinal  designs.  As  discussed  in 
greater  detail  in  Dames  et  al.  0982,  pp.  51-54),  length  of  causal  intervals,  or  equilibrium 
times,  may  be  used  to  establish  causal  orders  and  to  avoid  the  infinite  regress  implied  by 
many  open  system,  dynamic  models. 

Otherwise,  the  specification  of  causal  orders  required  by  Condition  1  is  largely 
provided  by  theory  (cf.  Morrison  Ac  Cook,  1985).  And,  it  is  possible  and  legitimate  to 
propose  several  alternative  causal  orders  a  priori,  conditional  on  having  good  theoretical 
reasons  for  each  alternative  ordering,  and  to  conduct  tests  to  ascertain  which  one  of  the 
orderings  best  fits  the  data  (Billings  Ac  Wroten,  1 978;  see  Dames  Ac  Tetrick,  1 98^  for  an 
example).  In  fact,  proposing  multiDle,  alternative,  theoretically-based  models  for  the 
same  set  of  data,  and  contrasting  these  a  priori  models  in  terms  of  fit  with  the  data,  is  a 
highlv  recommended  approach  to  confirmatory  analysis  (cf.  Dames  et  al.,  1 95?"?,  Chapt.  D\ 
On  the  other  hand,  one  should  not  explore  different  causal  orders  with  the  same  set  of 
data  in  order  to  find  the  causal  order  that  has  the  best  fit  with  the  data  ((Duncan,  1 97 5). 
This  is  never  a  legitimate  exercise  and,  if  attempted,  one  that  is  almost  surelv  to  be 
heavily  criticized.  (A  middle  ground  is  changing  causal  orders  as  part  of  a  specification 
analysis.  Such  changes  should  be  few  and  theoretically  based.  Of  course,  if  theoretically 
based  then  they  might  have  been  a  priori,  thereby  perhaps  obviating  the  need  for  a 
specification  search.) 

A  final  point  regarding  causal  order  pertains  to  the  lagged  (DSD'S  where  ordering  for 
some  aspects  of  the  causal  model  are  determined  by  time  of  measurement  (i.e.,  1982  or 
1986).  While  use  of  (DSD'S  reduces  at  least  some  possible  ambiguities  in  causal  ordering 
(e.g.,  events  in  1986  could  not  have  caused  events  in  1982),  there  is  a  price  to  be  Daid  in 
the  use  of  CSTS,  or  with  any  form  of  time  series  or  panel-type  design.  This  price  is  the 
requirement  that  the  times  of  measurement  (measurement  intervals,  such  as  the  interval 
between  the  1982  and  1986  waves  of  measurement)  "must  correspond  closely  to  the  true 
causal  intervals  in  a  time-series  design  (Kenny,  1979)"  (Dames  et  al.,  1^82,  p.  37).  It  is 
recommended,  therefore,  that  the  PDCH  work  unit  give  special  attention  to  a  theoretical 
justification  for  the  causal  interval  for  any  lagged  effect  (e.g.,  a  causal  connection 
between  a  1982  variable  and  a  1986  variable).  Moreover,  the  causal  intervals  will  vary 
among  individuals  in  the  longitudinal  designs  (e.g.,  all  sample  members  took  the 
questionnaire  in  1982,  but  those  who  did  not  continue  in  the  Navy  left  at  different  times). 
Length  of  time  between  questionnaire  administration  and  continuance  action  should  thus 
be  considered  in  terms  of  theoretical  implications  and  perhaps  treated  as  a  variable. 

The  final  condition,  and  perhaps  the  most  salient  one,  is  Condition  5,  which  requires 
that  causal  equations  be  self-contained.  Statistically,  self-containment  reouires  that  no 
covariation  occur  between  causal  variables  included  explicatelv  in  a  structural  eouation 
and  the  (theoretical)  disturbance  terms  of  that  equation  (Dames,  >980:  Dames  et  al.,  1987; 


Johnston,  1984).  Note  that  this  assumption  is  based  on  the  theoretical  disturbance  in  a 
structural  model  and  structural  equations  and  not  on  the  residual  or  error  terms  used  to 
estimate  disturbances  by  statistical  analyses.  A  less  statistically  oriented  approach  to 
this  assumption  is  to  require  that  all  relevant  causes  of  an  (or  each)  endogenous  variable 
are  included  in  the  structural  equation  for  that  endogenous  variable  (James  et  alM  1982). 
A  relevant  cause  is  a  causal  variable  that  (!)  has  at  least  a  moderate,  direct  effect  on  the 
endogenous  variable  (2)  is  stable,  (3)  is  related  to  at  least  one  other  causal  variable  in  the 
structural  equation,  and  (4)  is  not  linearly  dependent  on  the  other  causes  in  the  causal 
equation. 

The  basic  idea  of  self-containment,  or  its  obverse,  the  unmeasured  variables  problem, 
is  that  no  key  causal  variable  is  left  out  of  a  causal  equation.  Rut,  of  course,  tnis  is 
unavoidable  because  current  scientific  knowledge  regarding  most  endogenous  variables, 
including  career  decisions  and  actions,  is  incomplete  and  thus  all  relevant  causes  cannot 
be  considered  to  be  known.  Reasonable  satisfaction  of  the  self- containment  condition 
requires  attempts  be  made  to  include  known  relevant  causes  in  structural  equations 
(James  et  al.,  1982).  A  set  of  decision  criteria  for  establishing  reasonable  satisfaction  of 
Condition  5  is  presented  in  James  (198Q).  Since  these  criteria  are  rather  extensive,  they 
are  not  reproduced  here.  However,  the  James  (1980)  article  is  included  as  Appendix  A. 

General  Statistical  Requirements  for  Confirmatory  Analysis 

The  following  overview  of  statistical  requirements  was  obtained  from  many  sources, 
principal  among  these  were  Rentier  and  Chou  (1987),  Duncan  (197  5).  Havduk  (1987),  Heise 
(1975),  Johnston  (1984),  Joreskog  and  Sorbom  (1986),  Kenny  (1979),  Long  (lR83a,  lR83b), 
Namboodiri  et  al.  (lq75),  and  Ostrom  (1978).  Salient  statistical  requirements  that  must  be 
satisfied  by  all  of  the  three  analytic  models  (cross-sectional,  longitudinal,  lagged  CSTS> 
are  presented  below.  While  lengthy,  the  list  is  not  exhaustive  of  everv  possible 
requirement,  and  several  important  assumptions  are  addressed  in  the  discussion  of 
estimation  techniques.  Moreover,  we  have  not  differentiated  between  assumptions 
required  for  estimation  of  parameters  and  assumptions  required  only  for  interpretation  of 
parameters  and  statistical  inference,  the  logic  being  that  one  usuallv  wishes  to  interpret 
what  one  has  estimated. 

The  equation  below  is  presented  to  assist  in  ti  discussion  of  assumptions. 

Y  -  A  -  B;  \  :  •  B;  X*  *  d  (j) 

where  Y  is  the  endogenous  variable,  Xj  and  X j  (Xj,j  =  1,7)  are  causal  variables,  Rj  and  R? 
(Rj,j  =  1,2)  are  structural  parameters  for  the  Xj  in  raw-score  or  deviation-score  form  (if  Y 
and  the  Xj  are  in  standardized  form,  then  the  Rj  would  be  path  coefficients),  A  is  the 
intercept,  and  d  is  the  disturbance. 

1.  Within  subgroups  defined  by  salient  moderators,  relations  represented  by  the 
structural  parameters  Rj  and  P2  are  linear  and  additive.  The  issue  of  nonadditivitv  or 
moderation  (or  interaction)  is,  as  noted  earlier,  the  subject  of  Section  III,  and  thus  this 
issue  is  not  discussed  here.  Linearity  (in  the  variables)  refers  to  the  form  of  functional 
relationship  linking  the  endogenous  variable  to  the  causal  variables.  Tor  example,  a 
simple  bivariate  relation  is  linear  if  it  can  be  represented  by  a  straight  line  having  the 
form  Y  =  A  +  bX,  plus  error  in  stochastic  models.  Nonlinearity  in  the  variables  is  often 
addressed  by  polynomial  regression  equations  (cf.  Cohen  fc  Cohen,  1983),  where  one  or 
more  continuous  causal  variables  is  (are)  raised  to  powers  (typically  squared)  to  represent 
nonlinear  functions  such  as  U  or  inverted-U  shaped  relations  in  the  bivariate  case. 
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2.  If  X  j  and/or  X2  is  a  continuous  variable,  then  the  scale  of  measurement  is  at 
least  interval.  As  noted  by  James  et  al.,  (1982'!,  an  essentially  interval  scale  reasonably 
satisfies  this  assumption  (see  ^oyle,  1970), 

3.  The  causal  variables  are  perfectly  reliable  (if  variables,  including  Y  are  in 
standardized  form,  then  all  variables  in  the  model  are  assumed  to  be  perfectly  reliable). 
James  et  al.  0982)  suggest  that  "high"  reliabilities  reasonably  satisfy  this  assumption,  but 
note  the  lack  of  consensus  of  a  criterion  for  what  constitutes  "high."  Nevertheless,  the 
generally  high  coefficient  alphas  for  the  majority  of  (questionnaire)  variables  included  in 
this  study  suggests  that  problems  due  to  random  measurement  errors  (e.g.,  attenuation) 
are  unlikely  to  be  substantial.  Use  of  latent  variable  models  in  future  efforts  should 
reduce  the  problem  even  further. 

4.  The  X  variables  are  not  linearlv  dependent.  We  have  already  discussed  this  issue 
in  regard  to  its  role  in  multicollinearity. 

5.  The  disturbances  have  a  multivariate  normal  distribution,  where  each  dis¬ 
turbance  has  a  mean  of  zero,  and  the  variances  of  ihe  disturbances  are  equal.  These  are 
standard  assumptions  for  statistical  techniques  such  as  OL8  (ordinary  least  squares),  and 
involve  well  known  assumptions  such  as  normal  distributions  of  the  Ys  within  arravs  and 
horn  oscedasti  city. 

b.  X[  and  Xj  are  nonstochastic  or  fixed  variables.  Confirmatory  analysis  is  often 
based  on  an  OLS  (ordinary  least  squares)  "fixed"  regression  model  wherein  the  Y  and  d  are 
random  variables  and  the  Xj  are  fixed  variables.  This  fixed  variable  regression  model  is 
perhaps  better  suited  to  experimental  designs  where  investigators  determine  discrete 
values  for  each  Xj  and  then  randomly  sample  subjects  into  these  values.  Nevertheless, 
popular  texts  such  as  Cohen  and  Cohen  (1983)  are  based  on  the  fixed  variable  regression 
model  and  this  model  is  often  used  to  analyze  data  where  at  least  some  of  the  Xj  are 
clearlv  random  variables. 

Relaxing  this  assumption  and  allowing  the  Xj  to  be  stochastic  or  random 
variables  is  necessary  given  that  many  if  not  most  of  the  causal  variaoles  in  the  career 
development  models  are  random  variables.  This  is  easily  accomplished  if  one  is  willing  to 
assume  that  (a)  conditioned  on  each  X  (i.e.,  X|  and  X2),  the  disturbances  are  normally  and 
independently  distributed  with  means  equal  to  zero  and  variances  eaual,  and  X|  and  Xp 
are  unrelated  to  d,  which  is  the  self-containment  condition  discussed  earlier  in  regard  to 
Condition  5.  With  these  assumptions,  the  use  of  traditional  OL8  procedures  will  furnish 
meaning  estimators  and  significance  tests,  especially  in  large  samples  (see  Cramer  A 
Appelbaum,  1978:  Johnston,  1984,  Chaot.  7). 

7.  Absence  of  nonrandom  measurement  errors.  A  nonrandom  measurement  error  is 
a  systematic  source  of  basis  that,  if  Dresent,  reduces  the  accuracy  with  which  a  manifest 
variable  represents  an  underlying  construct  or  latent  variable  (Namboodiri  et  al.,  197  5). 
As  reviewed  in  James  et  al.  (1982,  p.  58),  nonrandom  measurement  errors  involve  (a) 
aggregation  and  disaggregation  biases,  fb)  ceiling  and  floor  effects  in  measurement  scales, 
(c)  classification  errors  resulting  from  poor  scaling  of  manifest  variables  (e.g.,  reducing  a 
reliable  continuum  to  a  dichotomy),  (d)  method  variance  resulting  from  the  fact  that  two 
or  more  manifest  variables  share  a  common  measurement  procedure  and  thus  are 
influenced  by  common  response  sets/ response  biases,  and  (e)  serially  correlated  errors  of 
measurement  that  result  from  use  of  the  same  measurement  scale(s)  in  two  or  more  waves 
of  data  collection. 
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The  career  development  data,  like  almost  any  set  of  field  data  collected  in  part 
by  questionnaires,  is  likely  subject  to  several  of  these  types  of  errors.  Aggregation  bias  is 
not  a  problem  as  long  as  individual  level  data  are  analyzed  with  individuals  as  the  unit  of 
analysis.  (Unit  and/or  macro  level  variables  may  be  added  to  these  analyses  suing 
techniques  discussed  by  3ames,  Demaree,  and  Hater,  1980--see  Appendix  R).  Aggregation 
of  individual  level  data  and  analyses  of  such  aggregate  data  should  proceed  only  af*^r 
careful  consideration  of  issues  pertaining  to  cross-level  inference  (see  Pedhazur,  1982, 
526-547  for  a  brief  and  cogent  review  of  the  issues). 

The  investigators  should  already  be  aware  of  ceiling/floor  effects  and  classifica¬ 
tion  errors  that  may  exist  in  the  data,  given  their  prior  scaling  efforts.  Thus,  we  proceed 
to  the  question  of  method  variance.  Tests  for  method  variance  are  often  based  on 
application  of  confirmatory  factor  analysis  (CFA)  to  various  operationalizations  of  the 
multitrait-multimethod  matrix  (cf.  Widaman,  1985;  Schmitt  &  Stults,  19861.  Such  tests, 
however,  require  that  each  construct  (latent  variable)  be  measured  bv  using  at  least  two 
different  methods.  Generally,  this  is  not  an  option  with  the  career  development  data. 

A  less  desirable  but  applicable  alternative  often  emploved  by  3ames  and 
colleagues  (e.g.,  3ames  <5c  3ones,  1980')  is  designed  to  test  whether  a  pervasive  method 
factor  has  biased  questionnaire  data.  To  illustrate  the  use  of  this  procedure,  suppose  we 
have  three  constructs,  labelled  A,  R,  and  C.  All  constructs  are  measured  bv  the  same 
procedure  (e.g.,  a  questionnaire).  Theory  may  suggest  a  high  correlation  between  A  and  R. 
Suppose  a  high  correlation  is  obtained.  Suppose  further  that  a  critic  argues  that  this  high 
correlation  is  primarily  a  product  of  method  variance  (i.e.,  a  pervasive  method  factor 
created  a  spurious  correlation  between  A  and  R).  A  test  of  the  critic’s  argument  is 
provided  by  introducing  variable  C,  where  (1)  C  is  measured  in  the  same  manner  as  A  and 
p,  (2)  C  is  subject  to  the  same  response  sets/styles  as  A  and  R  (e.g.,  acquiescence).  (3)  C 
has  psychometric  characteristics  that  are  similar  to  A  and  R,  and  (4)  C  theoretically  has 
low  relationships  with  A  and  R.  Now,  with  these  conditions,  high  correlations  between  G 
and  both  A  and  R  implies  a  pervasive  method  factor.  However,  low  correlations  between 
C  and  both  A  and  R  suggests  the  absence  of  a  pervasive  method  factor  and  thus  the  high 
correlation  between  A  and  R  cannot  be  totally  spurious.  Other  levels  of  correlation 
between  C  and  both  A  and  R  suggest  varying  levels  of  partied  spuriousness  engendered  by  a 
pervasive  method  factor. 

The  final  concern  in  regard  to  nonrandom  measurement  errors  is  correlated 
measurement  errors.  8uch  correlations  can  be  easily  checked  in  future  analyses  on  the 
C8T8  models  that  employ  latent  variable  designs  krf .  7oreskog  <V  Sorbom,  1986  and 
Section  IV). 


Additional  assumptions  for  cross-sectional  time-series.  In  addition  to  the  above, 
use  of  (lagged)  cross-sectional  time  series  requires  reasonable  satisfaction  of  the 
following  assumptions.  We  present  these  assumptions  using  the  lagged  CSTS  model  in 
Figure  3  as  a  guide.  In  Figure  3,  dD  and  yD  represent  theoretical  measurements  that  are 
included  to  denote  that  the  time  t  data  cannot  be  analyzed  by  themselves  without 
creating  a  serious  unmeasured  variables  problem.  (Please  note  the  implications  of  this 
point  for  Figure  Id.)  Time  t  is  analogous  to  the  1982  wave  data,  whereas  Time  t  *  1 
represents  the  1986  wave  data.  Time  t  =  2  refers  to  a  future  wave  of  data  collection. 
The  structural  equations  for  figure  3  are  (variables  are  assumed  to  be  in  deviation  form): 


y*.  - ;  -  Bv  y;  *  B:  •. « i  x:  i «  i  *  B2v«iX2t-»:  *  dt  •*  i 


(2) 


Vv 


Bi  V  ♦  2X2  t  ♦  2  +  dv«2 


(7) 


1  1 


Bt  •  l  y.  * :  *  B: ;  *  2X:  i  ♦  2 


Figure  3.  An  example  of  a  lagged,  cross-sectional,  time-series  model. 


The  equations  state  that  y  (e.g.,  continuance  intention)  is  a  function  of  v  at  a  prior  time 
and  contemporaneously  assessed  exogenous  variables  xj  and  x?.  Mote  that  no  equation 
exists  for  v*,  which  again  is  due  to  unavailability  of  yQ. 

The  assumptions  unique  to  this  lagged  CSTS  are: 

S.  Times  of  measurement  correspond  to  causal  intervals,  which  has  been  discussed. 

9.  The  model  is  stationary,  which,  based  on  prior  discussion,  would  be  indicated  by 
p't  =  p't+l,F'lt  +  1  =  R!t  =  2,  and  +  1  =  R2t  +  2  ‘n  Equations  3  and  4. 

10.  Histurbances  are  nonautoregressi ve,  which  means  that  no  covariation  exists 
between  d*  +  |  (Equation  2)  and  dt  _  7  (Equation  3). 

Without  the  time  t  +  2  measurements,  there  is  no  way  to  test  for  stationaritv 
test  for  stationarity  has  been  provided  by  (James  5c  Tetrick,  19*4).  Moreover,  given  the 
likelihood  of  unmeasured  causal  variables,  it  is  Drobable  that  Assumption  10  above  will  be 
violated  (see  (James  <5r  Singh,  197*,  Figure  *).  This  is  because  iack  of  autocorrelation 
between  dt  +  j  and  dt  +  y  (or  between  dt  and  dt+  |,  see  figure  3)  presumes  that  the 
disturbances  are  composed  of  random  shocks  (or  is  a  white  noise  series--cf.  (Johnston, 
19*4,  p.  371).  If  this  is  the  case,  then  the  structural  parameters  in  Equation  ?--the  only 
estimable  equation  given  two  waves  of  measurement- -can  be  estimated  directly  with  no 
further  ado. 

However,  consider  now  that  unmeasured  relevant  causes  reside  in  the  dis¬ 
turbance  terms  (cf.  (James,  19*0),  and  it  is  these  unmeasured  relevant  causes  that  are,  in 
part,  responsible  for  the  autocorrelation  of  the  disturbances  (the  curved  arrows  between 
the  cTs  in  the  model).  Inasmuch  as  no  field  model  is  self-contained,  it  follows  that  the 
disturbances  will  be  autocorrelated.  Straightforward  estimation  is  no  longer  possible. 
Various  complex  forms  of  instrumental  variables,  generalized  least  squares,  or  maximum 
likelihood  ((Johnston,  19*4,  Chapt.  9)  are  required.  This  is  a  moot  point,  however,  because, 
without  a  third  wave  of  data,  most  of  these  complex  forms  of  analyses  cannot  be 
implemented.  Consequently,  the  investigators  will  have  to  decide  whether  the  kev,  known 
relevant  variables  are  included  explicitly  in  their  lagged  C*TS  equations  having  the  form 
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of  Equation  2.  If  this  is  believed  to  be  the  case,  then  they  mav  proceed  to  estimate 
parameters.  These  estimates  will  be  both  biased  and  inconsistent,  and  significance  tests 
will  be  more  powerful  than  they  should  be  (see  Ostrom,  >978).  Nevertheless,  these 
Droblems  should  not  be  of  great  magnitude,  or  at  least  of  a  sufficient  magnitude  to 
preclude  analyses. 

Statistical  Estimation  Procedures 


Most  of  the  career  development  models  involve  continuous  variables  up  to  the  point 
of  the  final,  endogenous  action  or  outcome  variable.  The  action  or  outcome  variable  mav 
be  continuous,  as  in  the  case  of  the  upward  mobility  measures,  dichotomous,  as  in  the  case 
of  continuance,  or  nonordered  and  discrete  (i.e.,  qualitative),  which  applies  to  occupa¬ 
tional  development.  In  the  last  case--the  final  analyses  involving  the  occupational 
development  action  variables--a  multiple  discriminant  analysis  (MDA)  will  likely  be  in 
order.  For  the  continuous  upward  mobility  variable  and  the  dichotomous  continuance 
variable,  we  suggest  the  use  of  OLS  or,  preferably,  maximum  likelihood  (ML)  analyses. 
Later,  in  more  sophisticated  analyses,  the  dichotomous  continuance  variable  can  be 
subjected  to  such  things  as  event  history  analysis  (cf.  Allison,  195U)  and/or  logit  analysis 
(cf  ncrry  5:  Lewis-Deck,  1986). 

With  the  exception  of  the  use  of  MDA  to  complete  the  analyses  on  occupational 
development,  the  statistical  estimation  issue  boils  down  to  whether  one  is  going  to  use 
single  equation  estimation  techniques  (OLS)  versus  full-information  estimation  techniaues 
(LISREL). 

To  address  this  issue,  consider  the  structural  equations  for  the  cross-sectional  model 
Dresented  in  Figure  la  (variables  are  in  deviation  form): 


y: 

=  by 

x  Xi  *  by 

x  X2 

+  <ly 

(u) 

y; 

=  by 

«  Xj  *  d> 

<  5) 

=  by 

y  y:  -  b> 

y  yz 

+  dy 

(6) 

A  single  equation  estimator  such  as  OLS  could  be  used  to  estimate  the  structural 
parameters  in  each  of  the  three  equations.  The  term  "single  equation  estimator”  denotes 
that  a  separate  OLS  analysis  is  conducted  for  each  equation  and  thus  the  estimating 
process  for  one  equation  is  independent  of  the  estimating  Drocess  for  another  equation. 
Consequently,  specification  errors  that  engender  bias  or  inconsistency  in  one  equation  do 
not  spread  over  and  affect  the  bias  or  consistency  of  estimates  in  another  equation  (unless 
the  second  equation  is  subject  to  the  same  specification  errors  base  don  its  own  lack  of 
merits). 

In  contrast,  a  full-information  estimator,  such  as  the  full-information  ML  procedures 
used  in  LISREL  Ooreskog  &  Sorbom,  1986),  would  estimate  all  the  structural  parameters 
in  Equations  4  through  6  simultaneously.  While  more  efficient,  the  f ull-information 
techniques  suffer  the  problem  that  specification  error  in  one  equation  can  spread  over  and 
affect  estimates  in  a  different  equation  (cf.  MacCallum,  1986).  On  the  other  hand,  a 
salient  benefit  of  full-information  techniques  is  opportunity  to  test  the  overall  fit  of  the 
model  to  the  data.  In  this  regard,  we  strongly  recommend  reading  of  Wheaton  M9g7). 
Moreover,  use  of  the  full-information  techniques  in  LISREL  will  assist  substantially  in 
proceeding  to  the  moderator  analyses  discussed  in  Section  III. 
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In  sum,  either  OLS  or  LISREL  may  be  used  for  estimation  purposes.  We  recommend 
LISREL,  which  generally  means  the  use  of  full-information  ML  to  analyze  manifest 
variable  structural  models.  Checks  may  be  made  to  compare  the  LISREL  estimates  to 
OLS  estimates.  If  the  estimates  differ,  then  a  potential  culprit  is  the  spreading  of 
specification  errors  by  the  full- information  technique.  If  this  appears  likely,  then  the  OLS 
estimates  would  be  preferable. 

The  statistical  assumptions  required  to  employ  OLS  are  as  discussed,  with  one 
addition.  The  addition  is  that  each  equation  must  be  identified.  Identification  refers  to 
the  question  of  whether  sufficient  information  is  available  to  obtain  unique  mathematical 
estimates  of  structural  parameters  (cf.  James  et  al.,  '982).  Recursive  equations  based  on 
manifest  variables  are  generally  identified  and  thus  we  will  not  pursue  this  issue  here. 

For  full-information  maximum  likelihood  (FIML),  one  must  assume  that  the  Xs  and  Vs 
are  distributed  multivariate  normal.  In  addition,  identification  must  be  established  for 
each  parameter  (cf.  Long,  1983b).  The  identification  issue  should  again  not  be  a  problem. 

There  are  many  additional  issues  that  will  occur  during  statistical  analyses.  We 
prefer  to  deal  with  these  issues  on  an  interactive  basis  with  the  investigators  as  they 
arise.  On  the  other  hand,  we  do  wish  to  reiterate  several  points  raised  in  Report  1  that 
are  germane  to  estimation.  These  points  are  Cl)  use  of  hold-out  samples  for  cross- 
validation  purposes  Ccf.  Cudeck  -3c  Brown,  1983);  (2)  avoidance  of  the  use  of  change  scores: 
(3)  use  of  nonstandardized  data  in  analyses,  especially  the  lagged  CSTS;  and  (4) 
development  of  comparison  and  generalization  samples  Csee  p.  22,  Report  1)  for  lagged 
versus  nonlagged  analyses  and  for  analyses  based  on  selected  samples  (e.g.,  selection  of 
equal  numbers  of  stayers  and  leavers). 

Statistical  Specification  Errors 

Section  II  is  concluded  with  a  listing  of  errors  that  occur  often  in  practice  during 
estimation  and  model  fitting  Ccf.  Rentier  3r  Chou,  1987;  Billings  3c  Wroten,  1978).  We 
focus  on  issues  that  were  not  addressed  in  prior  discussions  of  conditions  for  causal 
modeling,  statistical  assumptions,  and  estimating  techniques.  We  recommend  Rentier  and 
Chou  (1987)  for  elaboration  on  points  4  through  7. 

1.  Sample  sizes  that  are  too  small  for  stable  statistical  results.  This  concern  mav 
arise  in  the  career  development  study  as  a  result  of  subgrouping,  which  is  to  sav  that  one 
or  more  of  the  subgroups  defined  by  salient  moderators  is  too  small.  While  no  clear-cut 
criterion  exists  for  defining  "small"  (there  are  many  heuristics,  however),  our  experience 
suggests  that  attempts  be  made  to  keep  sample  sizes  above  200. 

2.  Restriction  of  variance  on  the  criterion.  Low  variance  on  a  criterion  (en¬ 
dogenous  variable),  which  is  often  associated  with  a  skewed  distribution,  is  associated 
with  problems  in  trying  to  predict/explain  occurrences  of  the  criterion,  especially  if  data 
are  standardized  (e.g.,  path  analysis).  This  problem  may  be  a  result  of  naturally  occurring 
events,  such  as  low  base  rates,  or  induced,  such  as  restriction  of  range  due  to  incidental 
selection.  Remedies  include  the  use  of  correction  equations,  the  use  of  unstandardized 
data,  and  the  use  of  samples  selected  to  remove  base-rate  problems. 

3.  Presence  of  outliers  in  the  data.  Outliers  mav  or  mav  not  affect  various  aspects 
of  analyses.  An  article  by  Stevens  (1984)  has  a  good  review  of  procedures  for  detecting 
outliers. 


4.  Use  of  distribution  free  methods  on  small  samples.  Rentier  and  Chou  0987) 
recommend  that  unweighted  least  squares  (URL  —  a  full-information  method  in  LISREL)  be 
used  only  when  n  >  200. 

Failure  to  use  multiple  test  and  fit  criteria  to  evaluate  a  causal  model.  It  has 

become  apparent  to  many  authors  that  a  model  should  be  subjected  to  multiple  tests  and 
evaluated  with  multiple  fit  indices  (see  Wheaton,  19871. 

h.  Use  of  significance  tests  on  standardized  data  in  LISREL.  The  ehi-sauare 
significance  testing  procedures  are  designed  for  unstandardized  data  onlv  (Rentier  Jr 
Chou,  1987). 

7.  Failure  of  estimation  procedure  to  converge  in  LISREL.  Rentier  and  Chou  (1987) 
suggest  that  failure  to  converge  may  be  due  to  (a)  a  nonlinear  model  that  is  treated  as  if 
linear,  (b)  a  very  poor  initial  model,  (c)  poor  start  values  for  parameters,  (d )  unreasonable 
equality  constraints,  and  (e)  unidentified,  initial  oarameters. 


SECTION  III.  MODERATOR  ANALYSES 

A  major  issue  for  the  project  staff  is  the  likelihood  that  causes  of  major  variables 
related  to  Navy  career  decisions  may  differ  across  subcommunities,  different  time 
periods,  officer  ranks,  cohorts,  and  other  variables.  Our  discussion  with  the  princiDal 
investigator  and  his  staff  have  made  it  clear  that  detection  of  such  moderator  variables  is 
a  crucial  and  primary  goal  of  the  research.  Detection  of  moderator  variable  influences 
are  obviously  necessary  for  accurate  projection  of  future  trends  and  complete  understand¬ 
ing  of  officer  career  development  processes. 

Analysis  of  moderator  influences  can  proceed  using  either  the  ordinary  least  sauares 
(OLS)  or  maximum  likelihood  (ML)  approaches  to  manifest  variable  designs.  There  are 
two  chief  design  features  to  consider  in  designing  the  analysis.  The  first  is  whether  the 
moderator  variable  is  by  nature  a  categorical  or  continuous  variable.  The  second  is 
whether  the  hypothesized  locus  of  moderation  requires  testing  of  moderator  influences 
between  independent  samples  or,  alternatively,  tests  of  moderator  influences  between 
different  equations  within  the  same  sample  (primarily,  in  testing  differences  in  regression 
equations  in  lagged  panel  data). 

C ategorical  Moderator  Variables 

Independent  Groups  Analyses 

Analysis  of  moderator  variables  is  simple  and  straightforward  if  the  moderator 
variable  is  a  naturally  occurring  categorical  variable,  such  as  officer  cohort.  Here  we 
assume  that  (1)  the  variables  involved  in  the  regression  equations  are  equivalentlv 
measured  across  levels  of  the  categorical  moderator,  (2)  there  is  sufficient  sample  size  at 
each  level  of  the  moderator  variable  to  permit  meaningful  statistical  analysis  in  each 
subgroup,  and  (3)  the  analysis  is  to  be  done  with  metric  regression  coefficients.  Condition 
(1)  would  be  violated  in  many  instances  in  comparisons  of  subcommunities,  where 
different  variables  are  measured  and  where,  in  some  instances,  variables  have  a 
materially  different  interpretation  in,  say,  aviators  than  in  surface  warriors.  In  such 
cases,  formal  moderator  analysis  is  not  warranted.  Analysis  would  proceed  independently 
for  each  subcommunity,  but  the  regression  equations  would  not  be  analytically  tested  for 
equivalence  across  subcommunities.  Condition  (3)  is  crucial.  In  general,  one  does  not 


wish  to  estimate  the  moderator  effects  in  groups  where  separate  standardization  of 
variables  has  occurred.  Separate  standardization  reduces  the  likelihood  of  cross- 
validation  of  regression  coefficients,  in  general.  In  the  case  of  moderator  analysis,  it  is 
inappropriate  to  test  for  interaction  if  different  transformations  have  been  aDplied  to 
different  groups.  Separately  standardizing  the  groups  is  one  such  case.  Calculation  of 
different  transformations  can  introduce  or  obscure  interaction  effects.  Thus,  the  analysis 
cannot  be  done  bv  analyzing  correlation  matrices  for  each  of  the  groups.  This  would 
generally  not  be  done  in  the  multiple  regression  approach,  in  which  group  membership  is 
treated  as  a  variable  and  data  from  the  entire  sample  is  analyzed.  Separate  standardiza¬ 
tions  could  easily  be  requested  when  using  LISRFL  to  do  the  simultaneous  equations 
approach.  This  is  inappropriate,  and  the  analysis  should  be  conducted  on  covariance 
matrices  of  the  manifest  variables.  It  is  perfectly  acceptable  to  standardize  the  variables 
for  construction  of  composites,  but  this  standardization  must  be  done  on  the  pooled  data 
prior  to  segregation  into  groups  for  moderator  analysis. 

The  moderator  analysis  proceeds  in  two  different  ways,  depending  upon  whether 
separate  regression  equation  or  simultaneous  estimation  approaches  are  employed  (see 
Section  II). 

Separate  Regression  Equations.  Moderator  analysis  proceeds  by  using  product 
variables  and  hierarchical  regression  techniques  (Cohen  &  Cohen,  1983:  Pedhazur.  1982). 
If  all  exogenous  and  endogenous  variables  in  the  regression  equations  are  continuous 
variables  (excepting  the  moderator(s)),  then  product  variables  are  created  by  multiplying 
the  continuous  variables  by  a  set  of  coded  vectors  representing  category  membership.  \ 
categorical  moderator  with  m  levels  will  require  m- 1  coded  vectors,  unless  more 
restrictive  a  priori  hypotheses  about  moderation  are  to  be  entertained.  We  generally 
favor  orthogonal  coding  for  representation  of  moderators,  although  other  coding 
approaches  can  be  used.  A  separate  three-stage  hierarchical  regression  is  then  performed 
for  each  regression  equation  from  the  overall  model.  In  stage  1,  all  independent  variables 
for  the  equation  are  entered  and  an  R?  and  regression  coefficients  are  estimated.  In 
stage  2,  the  coded  vectors  representing  the  moderator  groups  are  added  to  the  equation. 
In  stage  3,  the  product  variables  are  added.  The  significance  of  the  increment  to  R7  from 
stage  2  to  stage  3  is  the  critical  test  of  whether  there  is  interaction  between  the 
moderator  variable  and  the  other  variables  entered  in  stage  I.  The  appropriate  statistical 
test  is  the  traditional  F-test  for  the  increment  to  R7.  It  can  be  requested  di recti v  in 
some  statistical  packages  (e.g.,  SP8S*  Regression,  which  we  recommend  generally  for 
hierarchical  regression  because  of  ease  of  interpretation  of  output).  As  discussed  rather 
nicely  by  pedhazur  (1982),  significant  interactions,  if  present,  mandate  calculation  and 
comparison  of  separate  regression  equations  for  each  group  (categorical  level  of  the 
moderator;  see  below).  In  the  absence  of  moderation,  the  common  (pooled)  regression 
coefficients  estimated  at  stage  1  in  the  analysis  may  be  used  as  estimates  of  effects. 

The  analysis  can  become  quite  cumbersome  if  (1)  multiple  moderator  variables  must 
be  considered  simultaneously,  and  (?)  if  there  are  manv  levels  of  each  moderator.  Our 
assessment  of  the  data  set  is  that  this  is  not  generally  the  case,  and  that  the  use  of 
hierarchical  regression  approaches  will  prove  satisfactory  in  many  cases. 

Simultaneous  Equation  Analysis.  If  full  information  maximum  likelihood  (FIML) 
approaches  have  been  used,  then  an  alternative  approach  to  moderator  analysis  can  be 
executed  by  using  LISRFL  VI  or  VII  (Joreskog  <5r  Sorbom,  1984).  Although  there  are  other 
excellent  FIML  programs  for  structural  regression  models  (such  as  Rentier's  FOS  pro¬ 
gram),  LISREL  is  the  only  program  currently  containing  the  option  to  analyze  regression 


equations  simultaneously  in  multiole  groups.  Henceforth,  we  shall  discuss  the  simul¬ 
taneous  equation  approach  presuming  use  of  LI9REL,  but  the  work  group  should  be  aware 
that  RMDP,  distributors  of  EOS,  have  issued  pre-release  publicity  about  a  version  3.0  of 
F.QS  that  apparently  will  handle  multiple  groups  analyses.  Thus,  the  moderator  analysis 
approaches  described  below  may,  in  the  near  future,  be  possible  in  EOS. 

s 

The  multiple  groups  approach  is  the  basis  for  testing  moderator  variables  in  LISRFu. 
One  begins  by  cutting  the  categorical  variable  into  mutually  exclusive  groups  and 
specifying  the  causal  model  in  each  group.  Then  the  appropriate  test  of  moderation  is 
whether  the  unstandardized  regression  coefficients  are  equal  across  the  multiple  groups. 
This  hypothesis  is  easily  tested  in  LISREL  by  testing  a  model  in  which  the  regression 
coefficients  are  constrained  equal. 

The  formal  statistical  test  of  moderation  requires  two  separate  models.  In  the  first 
model,  one  simply  runs  the  regression  analysis  simultaneously  in  each  group.  Assuming 
that  the  model  is  overidentified,  then  this  analysis  produces  a  likelihood  ratio  (LR)  X?  test 
of  the  goodness  of  fit  of  the  regression  model  to  the  data.  The  LR  test  is,  in  essence,  the 
sum  of  the  LR  across  all  the  moderator  groups.  Then  one  runs  a  second  model  that 
specifies  the  exact  same  regression  model  but  also  specifies  that  the  regression 
roeffi dents  arc  equal  in  the  multiple  groups.  This  second  model  is  said  tc  be  nested  in 
the  first  model,  because  it  has  the  same  basic  specification  but  imposes  the  additional 
constraint  that  the  coefficients,  which  ar  free  to  vary  between  grouns  in  the  first  model, 
are  required  (constrained  to  be  equal  in  the  second  model)  (e.g.,  Hwver,  19*3:  Havduk, 
19*7:  Ooreskog  Ar  Sorbom,  1979-  Long,  19K3b).  The  LI9REL  program  estimates  the 
common  regression  coefficients  hut  also  produces  a  new  LR  X7.  Recause  the  two  models 
are  nested,  the  difference  in  LR  X7s  is  a  formal  test  of  the  null  hypothesis  that  all 
regression  coefficients  are  equal  in  all  groups.  The  LR  X7  must  be  greater  for  the  more 
restricted  model  with  equal itv  constraints.  However,  if  there  is  no  moderation,  so  that 
the  regression  coefficients  are  truly  equal  across  the  suboopulations.  then  the  two  L^  X7 
tests  will  be  approximately  equal,  except  for  sampling  error  (i.e.,  the  difference  in  the 
two  X^s  will  be  approximately  equal  to  the  difference  in  degrees  of  freedom  (df)).  Thus, 
the  test  for  moderation  is  to  calculate  the  difference  in  LR  X?,  calculate  the~dif ference 
in  df  (which  should  equal  the  number  of  regression  coefficients  times  m- 1 ,  where  m  is  the 
number  of  groups),  and  evaluate  the  LR  X?  difference  against  a  critical  value  of  the  X? 
distribution.  It  must  be  emphasized  that  this  test  of  moderation  is  a  multivariate 
significance  test  of  moderation  across  all  regression  equations. 

Use  of  the  LISREL  approach  to  moderator  variables  is  actually  quite  efficient.  Onp 
does  not  need  to  generate  coded  vectors,  product  variables,  and  test  hierarchical 
increments  to  R7.  One  gets  a  single,  overall  test  of  moderation  across  all  equations.  This 
efficiency  in  the  statistical  test  may  be  a  curse  rather  than  a  blessing,  however,  if  it  is 
expected  in  advance  that  the  moderator  variable  affects  relatively  few  of  the  overall 
number  of  regression  equations.  In  that  case,  the  Type  II  error  rate  of  the  overall  LR  X? 
test  for  the  few  equations  that  are  truly  different  across  groups  will  be  higher  than  in 
single  equation  approaches.  On  the  other  hand,  the  simultaneous  approach  provides  better 
control  of  the  Type  I  error  rate  across  all  equations  than  the  separate  equation  approach. 
This  is  not  merely  a  function  of  the  fact  that  an  overall  LR  test  is  computed  (as  opposed 
to  separate  F-tests  for  each  equation).  Given  that  the  same  independent  variables  are 
usually  present  in  multiple  regression  equations  (which  is  the  case  if  endogenous  variables 
are  specified  to  have  both  direct  and  indirect  effects),  the  separate  regression  equations 
will  not  be  statistically  independent.  Although  the  regression  coefficients  in  a  single 
equation  are  independent  across  multiole,  independent  groups,  the  regression  coefficients 
will  have  nonzero  covariances  of  estimate  across  different  equations  owning  to  shared 
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indeDendent  variables.  The  LISREL  approach  takes  these  covariances  of  estimate  into 
account  in  calculating  the  overall  LR  test.  The  separate  regressions  approach  does 
not. 


It  is  nossible  to  get  a  separate  LR  test  for  each  equation  that  is  an  exact  logical 
analog  of  the  F-test  for  each  equation  in  the  separate  equation  approach.  Again,  the  LR 
X2  test  is  superior  in  that  the  covariances  of  estimate  are  still  used  in  calculating  the 
statistical  test  of  fit.  This  is  done  by  imposing  the  equality  constraints  on  only  one 
equation  at  a  time,  and  then  calculating  the  difference  in  X?  against  the  model  with  no 
equality  constraints  cn  any  equation.  Moreover,  it  is  possible  to  specify  a  priori  that 
there  will  be  moderation  on  a  subset  of  equations  and  to  test  a  LISREL  model  specifying 
moderation  only  on  these  equations.  In  the  case  of  mixtures  of  equality  constraints  and  no 
equality  constraints  across  equations,  the  specification  of  the  model  is  somewhat  more 
complicated.  One  has  to  specify  the  equality  constraints  parameter  by  parameter,  but 
this  is  by  no  means  a  major  obstacle. 

^ranklv,  the  use  of  LISREL  to  test  moderator  influences  is  relatively  new,  and  has 
not  been  widely  discussed  in  the  literature.  Hayduk  (19971,  for  example,  does  treat  the 
issue  of  "stacked  models"  but  devotes  more  space  to  the  issue  of  simultaneous  models  for 
means  and  covariance  structures  than  to  the  implications  of  testing  equivalence  of 
structural  coefficients  across  multiple  groups.  When  moderator  analysis  in  LISREL  is 
discussed,  it  is  usually  at  the  level  of  latent  variable  rather  than  manifest  variable 
designs.  There  is  insufficient  simulation  data  to  evaluate  differences  between  the  two 
methods  (separate  equation  hierarchical  regression  and  LI9REL).  Our  limited  experience 
suggests  that  the  two  methods  produce  quite  similar  results  in  recursive  models. 
Additionally,  there  are  advantages  for  simultaneous  approaches  such  as  LISREL  above  and 
beyond  efficiency.  They  are  also  appropriate  for  nonrecursive  models  and  for  models  with 
correlated  regression  residuals,  both  of  which  are  poorlv  handled  bv  single  equation 
procedures  (with  or  without  moderator  variables').  The  specification  of  the  eaualitv  test 
in  LISREL  is  also  verv  simple.  Ev  far  the  most  difficult  problem  is  specifying  the  base 
mode!,  although  this  is  also  relatively  straightforward  in  LISREL  analysis  with  manifest 
variables.  We  can  provide  a  sample  LISREL  specification  of  the  equality  test  upon 
request. 

Post-hoc  Comparisons.  The  significance  tests  for  either  the  separate  equation 
approach  or  the  simultaneous  equation  approach  are  tests  of  what  might  be  termed 
"omnibus"  null  hypotheses  (i.e.,  no  moderator  effects  on  any  independent  variable  in  the 
equation).  If  the  null  hypothesis  is  rejected,  the  alternative  hypothesis  is  that  not  all 
regression  coefficients  are  equal  across  all  groups.  At  that  point,  a  second  analysis  is 
required.  The  purpose  of  this  analysis  is  to  determine  (P  which  moderator  groups  differ 
on  (2)  which  regression  coefficients.  Without  prior  theory,  this  approach  entails  post-hoc 
comparisons  of  regression  coefficients  across  groups.  The  logic  of  the  post-hoc  analysis  is 
exactly  the  same  as  the  more  familiar  post-hoc  tests  of  means  in  ANOVA. 

We  will  describe  a  general  approach  for  testing  the  moderator  effects  across 
independent  groups.  It  should  be  noted  that  this  approach  does  not  take  into  account  the 
full  covariance  matrix  of  the  regression  estimates  in  calculating  specific  error  terms  for 
post-hoc  comparisons.  This  approach  is  easily  applied  for  post-hoc  detection  of  moderator 
effects  with  both  the  single  equation  and  simultaneous  equation  approaches.  The  analysis 
proceeds  from  the  parameter  estimates  from  the  regression  equations  of  each  group.  In 
the  single  equation  approach,  one  must  first  calculate  separate  regression  equations  in 
each  moderator  group  (which  had  not  been  done  prior  to  the  detection  of  significant 
interaction).  In  the  LISREL  approach,  one  uses  the  estimated  regression  coefficients  and 
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standard  errors  from  the  model  that  imDOsed  no  equality  constraints  on  the  parameters. 
Here  we  assume  that  one  has  available  O)  regression  coefficients  for  all  equations  for  all 
groups,  and  (?)  standard  errors  of  estimate  for  all  coefficients.  It  is  possible  to  compare 
any  pair  of  regression  coefficients  by  use  of  the  following  formula: 

1  =  b  l  -  b2  /  secorrip, 

where  _t  is  a  t-test,  b|  and  b2  are  regression  coefficients,  and  scomp  is  the  standard  error 
for  the  comparison.  The  formula  can  be  generalized  to  any  linear  combination  of 
regression  coefficients  (see  below).  The  standard  error  for  the  comparison,  when  the 
regression  coefficients  are  derived  from  independent  samples  is 

seComp  =  (var  est  (bj)  +  var  est  O^))'’^, 

where  var  est  is  the  variance  of  estimate  for  the  regression  coefficient.  Most  regression 
packages  report  both  the  b-weight  and  the  standard  error  of  the  b-weight  (which  is  the 
square  root  of  the  variance  of  estimate),  so  the  comparison  and  the  standard  error  of  the 
comparison  are  easily  calculated.  LI8RF.L  reports  the  ML  parameter  estimates  and,  upon 
user  request,  their  asvmptotic  standard  errors.  Parenthetically,  it  should  be  noted  that 
LISREL's  t-values  test  the  null  hypothesis  that  the  population  parameter  is  equal  to  zero. 
Thev  are  not  the  jt-test  of  the  difference  in  regression  coefficients  over  moderator  groups 
described  above. 

The  problem,  of  course,  is  that  there  is  a  very  large  number  of  such  comparisons  that 
can  be  made  as  the  number  of  equations,  independent  variables  per  equation,  and 
moderator  group  levels  increase.  Practically,  it  can  become  quite  tedious  to  calculate  the 
.t-test  for  all  comparisons.  Prom  a  statistical  inference  perspective,  the  more  important 
problem  is  protection  of  the  Type  I  error  rate  across  multiple  comparisons.  Corrections 
for  all  possible  pair-wise  combinations  of  regression  coefficients  would  probablv  be  too 
conservative  (have  too  high  a  T vpe  II  error  rate).  In  our  opinion,  the  best  approach  is  to 
employ  a  Ronferroni  correction  on  the  critical  value  for  t  used  in  evaluating  the 
comparisons.  The  Ronferroni  approach  maximizes  statistical  power  while  controlling  the 
Type  I  error  rate  (see  Ramsev,  1982).  With  the  Ronferroni  approach,  one  adjusts  the 
critical  value  of  _t  according  to  the  actual  number  of  comparisons  to  be  entertained. 
Maximum  power  is  achieved  by  sequential  adjustment  of  the  critical  value,  but  this  is 
tedious  in  practice. 

This  approach  should  provide  a  reasonable  degree  of  protection  of  Tvpe  I  error  rate 
while  minimizing  Tvpe  II  errors.  It  assumes  that  the  covariances  of  estimates  for  all 
parameters  is  zero.  As  discussed  above,  this  assumption  is  violated  when  the  same 
independent  variables  appear  in  multiple  equations.  The  principal  assumption  of 
importance  is  undoubtedly  independence  across  levels  of  the  moderator  variable,  which  is 
satisfied  by  the  independent  groups  analysis.  Nevertheless,  it  is  possible  in  the 
simultaneous  equation  approach  to  generalize  the  post-hoc  analysis  by  computing  linear 
contrasts  across  the  vector  of  regression  coefficients  and  then  creating  an  appropriate 
standard  error  for  the  contrasts  by  pre-  and  post-multiplying  the  covariance  matrix  of  the 
estimates  by  the  vector  of  contrasts.  At  this  point,  it  is  not  known  what  the  inflation  of 
the  Tyoe  I  error  rate  is  under  these  conditions,  although  it  seems  plausible  that  the  degree 
of  inflation  is  minimal.  This  problem  has  not,  to  our  knowledge,  been  simulated  in  the 
statistical  literature.  Our  recommendation  is  to  proceed  with  the  simpler  post-hoc 
comparisons  under  independence  assumptions  (particular! v  if  the  single  eauation  approach 
is  used). 
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This  recommendation  is  driven  by  pragmatic  constraints.  It  is  too  time-consuming  to 
calculate  the  asymptotically  exact  standard  errors  of  the  contrasts  bv  hand  by  using  the 
covariance  matrix  of  the  estimates.  Indeed,  it  will  be  tedious  to  compute  the  pair-wise 
comparisons  bv  hand,  even  when  using  the  simultaneous  Ronferroni  adjustment  to  the 
critical  value  of  the  test  statistic  and  only  employing  the  standard  errors  of  estimate.  In 
principle,  it  would  be  a  straightforward  programming  task  to  create  a  program  to 
generate  post-hoc  statistics  on  the  regression  coefficients,  incorporating  seauential 
Ronferroni  adjustments,  use  of  the  entire  covariance  matrix  of  estimates  to  generate 
standard  errors,  and  options  for  setting  (and  perhaps  changing!  the  desired  experiment- 
wise  Type  I  error  rate.  Employment  of  the  covariance  matrix  of  estimates  is  most  easilv 
and  efficiently  done  using  the  LISRF.L  program  and  the  simultaneous  equation  approach. 
LISREL  can,  upon  user  request,  output  both  the  regression  parameter  matrix  and  the 
covariance  matrix  of  the  estimates.  From  these  matrices,  the  appropriate  t  statistics  can 
be  calculated  directly  in  matrix  form.  It  would  therefore  be  possible  to  program  the 
asymptotically  unbiased  post-hoc  tests  using  the  entire  covariance  matrix  of  the  esti¬ 
mates.  Indeed,  our  past  experience  with  algorithms  of  this  type  is  that  most  of  the 
programming  overhead  involves  constructing  the  input  matrices  and  the  formatting  of  the 
output  statistics  rather  than  the  statistical  algorithms.  Thus  the  difference  in  program¬ 
ming  time  for  the  comparisons  using  the  entire  covariance  matrix  of  the  estimates  would 
differ  trivially  from  comparisons  using  only  the  standard  errors  of  estimate.  The  PPiCH 
unit  mav  wish  to  devote  some  programmer  time  to  development  of  this  program. 

Within-groups  Moderator  Analysis 

The  within-groups  moderator  analysis  is  required  whenever  equations  are  to  be 
compared  across  variables  measured  on  the  same  persons.  One  example  is  where 
equations  are  to  be  compared  across  multiple  work  settings  (e.g.,  prediction  of  satisfac¬ 
tion  in  multiple  settings  from  the  same  background  variables!.  A  unique  but  potentially 
important  case  in  the  POCO  data  sets  would  be  the  special  case  of  time  of  measurement 
as  a  categorical  moderator  variable.  The  concept  is  actually  inherent  in  the  lagged 
endogenous  models  discussed  above.  For  example,  one  might  wish  to  know  if  the  within- 
occasion  predictors  for  intent  to  remain  in  the  Navy  have  changed  from  ]R82  to  lQ8h  for 
the  same  officer  cohort.  Provided  that  the  independent  variables  are  scaled  in  the  same 
way  (although  if  not,  the  test  can  still  be  performed  after  judicious  rescaling  of  the 
predictors!  at  both  occasions,  it  is  possible  to  test  for  stationaritv  in  the  prediction 
equations  by  doing  a  repeated  measures  test  of  the  equalitv  of  the  regression  equations 
over  time.  This  sort  of  approach  is  conceptually  related  to  but  distinct  from  the  lagged 
endogenous  causal  models  discussed  in  Sections  I  and  II,  but  mav  be  important  for 
forecasting  purposes.  The  basic  logic  would  also  applv,  however,  to  testing  for  equalitv  of 
effects  in  the  lagged  endogenous  causal  models. 

The  crucial  issue  here,  from  a  statistical  perspective,  is  that  the  regression  equations 
are  correlated  because  they  are  calculated  on  the  same  observational  units.  Thus,  anv 
test  of  the  equality  of  the  regression  coefficients  from  these  related  equations  must  take 
the  covariances  of  estimate  from  the  regression  equations  into  account  frames  A:  Tetrick, 
1984!. 

Separate  Regression  Equations.  James  and  his  colleagues  (3ames,  Joe,  <V  Irons,  198?; 
James  &  Tetrick,  1 98 R1)  have  outlined  a  procedure  by  which  testing  of  related  regression 
equations  may  be  accomplished.  The  proc  jre  involves  (l!  calculation  of  regression 
coefficients  for  the  separate  equations,  and  (?!  use  of  asymptotic  theory  to  derive  an 
appropriate  estimate  of  the  covariance  matrix  of  the  estimates  for  the  related  equations. 
These  computations  are  relatively  tedious,  but  manageable.  1'sing  the  matrix  o* 
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estimated  regression  coefficients  from  the  related  equations  and  the  apnroximate 
covariance  matrix  of  these  estimates,  one  can  derive  F-tests  for  the  comparisons  of 
subsets  of  regression  coefficients.  The  approach  is  quite  general  and  powerful,  and 
handles  the  case  of  planned  comparisons.  Copies  of  the  relevant  papers  have  already  been 
provided  to  the  work  group. 

Simultaneous  F.quation  Approach.  It  is  possible  to  use  the  LISRHL  program  to  test 
the  equality  of  the  related  regression  equations.  Unlike  the  independent  samples  case 
described  above,  the  data  are  treated  as  a  single  group.  However,  the  full  system  of 
regression  equations  is  specified,  with  a  separate  equation  for  each  moderator  variable. 
So,  in  the  case  of  time  of  measurement,  one  would  simultaneously  specify  the  equations 
for  the  1982  and  1986  waves  in  the  same  model.  The  same  logic  with  respect  to 
statistical  inference  applies.  One  first  computes  the  model  for  all  equations,  and 
(assuming  an  over  identified  model)  obtains  the  LR  statistic  and  the  parameter 
estimates.  One  then  runs  a  second  model  in  which  equality  constraints  are  imposed  on 
those  coefficients  that  are  hypothesized  to  varv  across  levels  of  the  within-groups 
moderator  variable.  The  difference  in  Lp  X"?  tests  is  a  test  of  the  null  hypothesis  that  the 
regression  coefficients  are  equal  over  time. 

One  advantage  of  the  LISREL  approach  is  that  it  is  possible  to  combine  the  subgroup 
and  related-equations  moderator  analysis  into  a  single  model,  when  appropriate. 

Continuous  Moderator  Variables 


Analysis  of  continuous  moderator  variables  poses  additional  problems.  It  is  relatively 
simple  to  use  hierarchical  regression  techniques  to  test  for  differential  impact  on 
continuous  moderator  variables,  provided  that  one  assumes  the  nature  of  moderation  is 
linear  and  continuous  across  the  range  of  the  moderator  and  other  independent  variables 
(Dames,  1987).  (In  fact,  this  assumption  is  often  unreasonable).  The  test  procedure  is 
identical  to  that  with  categorical  moderator  variables,  in  that  product  variables  are 
formed  by  multiplying  the  moderator  variable(s)  and  the  independent  variablefsk  pronr 
there,  the  same  three-stage  process  is  employed.  Stage  3  involves  adding  the  product 
variables  to  the  equation  and  testing  the  increment  to  R?.  If  the  product  terms  do  not 
increase  prediction,  then  one  can  consider  using  the  equations  from  stage  1  or  stage  2, 
depending  upon  (1)  the  logical  status  of  the  moderator  as  a  predictor  variable  in  the 
svstem  of  equations  and  (2)  the  significance  of  the  regression  coefficients  involving  the 
candidate  moderator  variable.  In  practice,  continuous  variable  interactions  of  this  type 
are  not  usually  moderator  variable  analyses  per  se,  but  rather  tests  of  additivity  of  causal 
influence  across  endogenous  variables.  So  in  all  likelihood,  the  variables  were  already  in 
the  system  of  equations  and  would  be  kept  in  the  final  equations. 

The  real  sticky  wicket  is  what  to  do  if  continuous  variable  interactions  are  detected. 
The  two  chief  regression  texts  that  review  these  issues  (Cohen  <5c  Cohen,  1983;  Pedhazur, 
1982)  differ  quite  dramatically  on  what  to  do  in  such  circumstances.  Pedhazur  eschews 
the  practice  generally,  for  reasons  we  do  not  find  compelling.  Cohen  and  Cohen  (1983) 
suggests  nested  substitution  of  equations  so  as  to  be  able  to  graph  the  nature  of  the 
interaction.  This  is  descriptively  informative  but  not  necessarily  sufficient.  Rules  for 
calculating  direct  and  indirect  effects  in  structural  equation  models  in  the  presence  of 
interaction  have  been  discussed  perfunctorily  in  the  sociological  literature.  Conceptually, 
the  problem  is  analogous  to  the  more  familiar  issue  in  ANOVA:  how  does  one  interpret 
main  effects  (linear  effects  of  variables)  in  the  presence  of  interaction  (continuous 
variable  moderation)"7  The  answer  in  regression,  as  in  ANOVA,  is  that  the  simple  linear 
effect  of  x  on  v,  in  the  presence  of  interaction  involving  x  and  z,  is  descriptively 
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meaningful  (in  essence,  consistent  direction  but  variable  magnitude  of  relationship  of  x 
and  y  over  the  entire  range  of  the  moderator,  2)  but  uninterpretable  from  a  causal  point 
of  view.  Small  wonder  most  investigators  choose  not  to  even  look  for  interactive  effects 
(assuming  they  do  not  exist)  or,  alternatively,  convert  moderators  to  categorical  variables 
to  assist  in  ease  of  computation  and  interpretation.  As  pointed  out  bv  Cohen  and  Cohen 
(1983),  this  approach  throws  away  information  if  the  independent  variables  are  both 
continuous  and  the  interaction  is  linear  in  both  variables.  Of  course,  given  discontinuous 
interaction  effects,  the  grouping  approach  may  be  superior,  provided  that  the  proper 
cutoffs  for  assigning  groups  is  known  a  priori,  stumbled  upon  bv  chance,  or  detected  by 
interpretation  of  scatter  plots,  regression  residuals,  and  other  techniques  (.lames,  1987). 

The  real  problem  is  how  to  introduce  continuous  variable  moderation  into  simul¬ 
taneous  equation  approaches.  Here  the  simplicity  of  the  testing  procedures  in  single 
equation  approaches  is  appealing.  The  problem  is  that  introduction  of  the  product 
variables  into  the  regression  equations  causes  a  specification  error  in  terms  of  the 
hypothesis  of  uncorrelated  errors  in  equations.  It  also  introduces  correlations  between 
regression  coefficients  and  disturbances,  which  must  be  modeled  explicitly  if  the 
regression  coefficients  and  associated  standard  errors  are  to  be  unbiased  (by  specification 
error).  Kenny  and  ludd  (1984)  have  discussed  this  issue  in  latent  variable  modeling  (see 
also  Hayduk,  1987).  The  only  method  for  handling  this  type  of  analysis  is  to  use 
covariance  structure  models  like  COSAN  that  can  impose  nonlinear  constraints  on 
parameter  estimates.  We  suspect--and  hope--that  much  more  will  be  known  about  this 
problem  in  a  few  years.  For  now,  we  suggest  that  tests  of  continuous  interaction  be 
entertained  on  theoretical  grounds,  and  investigated  using  hierarchical  regression,  if 
needed.  We  do  not  believe  enough  is  known  about  the  introduction  of  product  variables 
into  structural  equation  models  to  justify  staff  effort  to  learn  the  nuances  of  nonlinear 
constraint  specification  and  how  to  use  the  COSAN  program  (which  makes  LISRFL  look 
like  BASIC). 


SECTION  IV.  FUTURE  DIRECTIONS 

The  intent  of  this  section  is  to  furnish  recommendations  for  future  research  that  has 
a  scientific  emphasis.  We  will  be  brief  because  we  wish  only  to  highlight  possible  avenues 
for  work  group  consideration.  On  the  other  hand,  we  are  prepared  to  work  with  staff  at 
this  time  on  these  methods  if  they  are  considered  desirable  for  immediate  emphasis  and 
evaluation. 

Use  of  Categorical  Endogenous  Variables 

Some  of  the  endogenous  variables  in  the  data  set  are  true  categorical  variables.  We 
have  discussed  using  true  categorical  variables  as  moderators,  but  this  mainly  applies 
when  the  categorical  variables  divide  individuals  into  mutually  exclusive  groups.  When 
outcome  (criterion)  measures  are  categories,  the  project  team  may  prefer  to  predict  the 
criterion  rather  than  test  for  moderation  by  it.  For  example,  a  crucial  problem  is 
predicting  the  retirement  decision  (stay  in  the  Navy,  opt  for  retirement,  opt  for 
retraining,  etc.).  Knowing  which  variables  provide  prediction  of  the  outcome  categories  is 
different  than  asking  whether  other  variables  differ  in  relationship  according  to  outcome 
category.  The  situation  is  made  more  complex  when  the  prediction  equation  is  actually 
nested  in  a  multiple  equation  structural  model. 

Experts  differ  on  whether  one  can  introduce  categorical  endogenous  variables  into 
linear  structural  models  of  the  kind  we  have  been  discussing.  We  admire  the  courage  of 


Rentier  and  Chou  (1987),  who  categorically  state,  without  much  supporting  argumentation 
or  data,  that  this  approach  is  fully  acceptable  provided  that  the  marginal  frequencies  are 
not  excessively  disproportional  (e.g.,  an  89%  to  20%  split)  and  becomes  more  acceptable 
as  the  number  of  categories  increases.  There  are  other  alternatives  that  can  be 
considered.  One  is  the  use  of  logit  regression  to  predict  the  categorical  dependent 
variable.  We  would  generally  recommend  this  approach  if  (1)  the  single  equation  appro:  *.  > 
has  been  used  and  (2)  even  if  not,  if  the  categorical  variable  is  a  final  outcome,  lwe 
retirement  decision.  More  elegant  analysis  for  categorical  variables  include  techniques 
like  latent  class  analysis,  including  the  Grizzle/Kock/Landis  approach  for  GL8  estimation 
of  effects,  and  event  history  analysis.  The  latter  is  akin  to  logit  regression  but  takes  into 
account,  and  indeed  models  explicitly,  the  time  course  of  shifts  in  category  group 
membership.  Allison  (1982,  1984)  provides  a  useful  introduction  to  this  set  of  techniques. 
We  do  not  recommend  this  as  a  general  approach  for  the  research  team  at  this  time,  given 
the  relatively  limited  time  remaining  to  analyze  the  data  set. 


Latent  Variable  Structural  Equation  Models 


Although  we  have  recommended  manifest  variable  designs,  given  the  time  constraints 
on  the  project,  it  would  be  preferable  to  conduct  analyses  using  the  full  LI9PPL  approach, 
particularly  on  the  lagged  endogenous  variable  models.  We  wish  to  discuss  brieflv  the 
benefits  of  doing  the  full  latent  variable  models. 


The  chief  benefit  of  the  latent  variable  models  is  that  the  structural  regression 
coefficients  are  disattenuated  for  measurement  error.  It  is  freauentlv  astonishing  to 
observe  the  degree  of  impact  measurement  error  can  have  in  structural  equation  models 
when  single  indicators  have  moderate  reliabilities.  As  we  point  out  in  Section  T,  the 
reliabilities  reported  for  the  candidate  variables  are  encouraging,  and  composite  variables 
are  usually  more  reliable  than  their  individual  constituent  variables.  Nevertheless,  it 
would  be  desirable  to  estimate  effects  without  contamination  of  measurement  error. 
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Another  related  benefit  of  structural  equation  models  with  latent  variables  is  the 
opportunity  to  test  directly  assumptions  of  equivalence  of  the  measurement  model  in 
lagged  designs.  We  often  assume,  by  fiat,  that  composite  variables  measure  the  same 
construct  in  equivalent  ways  across  time  (or  across  groups).  The  chief  advantage  of 
longitudinal  measurement  models  is  the  ability  to  test  equivalence  in  the  measurement 
model  using  the  type  of  LP  tests  described  above,  but  where  the  tests  are  test  of 
constraints  on  the  regression  coefficients  of  observed  variables  on  latent  variables  (rather 
than  tests  on  the  structural  regression  coefficients  themselves).  Hertzog  (1987)  has 
reviewed  some  studies  that  have  employed  this  approach  in  examining  adult  intellectual 
development  and  measurement  properties  of  mood  state  variables  (see  also  Hertzog  A 
Nesselroade,  1987;  Hertzog  &  Schaie,  1986,  1988  in  Appendices  C  through  E  for  detailed 
examples). 


Another  advantage  of  latent  variable  models  is  the  commensurate  increase  in  the 
validity  of  the  regression  coefficients.  Provided  that  there  is  minimal  sharing  of  method 
variance,  the  structural  regression  estimates  from  latent  variable  models  are  more  likelv 
to  represent  construct  relationships  than  systematic  measurement  (method)  variance. 

Another  useful  application  of  structural  modeling  is  in  the  domain  of  confirmatory 
factor  analysis  itself.  Although  manifest  variable  designs  with  composites  can  be 
appropriate,  they  are  more  fully  justified  if  it  can  be  shown  that  the  indicators  do  indeed 
factor  as  hypothesized  (perhaps  implicitly)  by  the  compositing  scheme.  Thus,  it  is  possible 
to  do  confirmatory  factor  analysis  to  justify  compositing  variables,  and  then  use  the 
composites  to  test  the  continuous  interaction  hypotheses  using  hierarchical  regression. 
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The  composites  can  also,  in  such  conditions,  be  formed  by  use  of  factor  score  estimation 
procedures  rather  than  simple  unit-weighting  of  z-scores.  Hultsch,  Hertzog,  and  Dixon 
(198**)  used  this  approach  to  examine  age  X  intelligence  interaction  effects  in  predicting 
text  memory  in  adults  (see  Appendix  F  for  details). 


SECTION  V.  SUMMARY 

We  have  outlined  a  series  of  research  design  and  analysis  options  for  staff  to 
consider.  In  a  report  like  this,  it  is  difficult  to  specify  exactly  what  an  appropriate 
structural  model  would  look  like.  This  is  best  done  in  direct  design  consultation  with  the 
contractors  on  a  specific  research  problem,  bringing  theory  about  measurement  and  latent 
variable  relationships  to  bear  in  the  design  phase. 

Our  general  recommendation  has  been  for  the  work  group  to  proceed  immediately 
with  manifest  var'able  regression  analysis  that  has  predictive  utilitv,  is  more  easily 
summarized  and  communicated  to  higher  levels  in  the  Mavy,  is  scientifically  defensible, 
and  can  be  accomplished  in  relatively  short  order.  This  decision  is  driven  in  large  part  by 
pragmatic  considerations.  An  alternative  is  for  the  work  group  to  decide  to  take 
additional  time  and  to  concentrate  on  some  of  the  latent  variable  technioues  described  in 
the  last  section  of  the  report.  It  is  important  to  note  that,  should  the  work  group  decide 
to  pursue  latent  variable  structural  equations  analysis,  then  it  is  advisable  to  consider  a 
roughly  two-stage  process  fAnderson  <5c  Gerbing,  1988):  (1)  development  of  the  measure¬ 

ment  model  for  all  exogenous  and  endogenous  latent  variables  with  confirmatory  factor 
analysis,  followed  by  (2)  incorporation  of  the  structural  regression  model  into  the 
previously  developed  measurement  model.  This  approach  has  two  advantages.  First,  one 
can  be  confident  that  the  structural  model  is  not  contaminated  bv  specification  errors  in 
the  measurement  model.  Usually,  it  is  the  structural  coefficients  that  are  of  primary 
interest,  and  one  does  not  want  spread  of  specification  error  from  the  measurement  model 
in  an  FI  ML  approach  to  bias  structural  regression  coefficients.  Second,  it  is  possible  to 
treat  the  full  structural  model  as  a  more  restricted,  nested  model  from  the  measurement 
model,  and  to  then  calculate  a  difference  in  statistic  that  separates  lack  of  fit  in  the 
structural  model  from  the  overall  fit  of  the  model,  combining  lack  of  fit  in  both  structural 
and  measurement  submodels.  This  approach  provides  a  more  accurate  assessment  of  the 
viability  of  the  structural  model. 

The  implication  of  the  foregoing  is  that,  if  the  PDCD  work  group  decides  to  proceed 
with  latent  variable  rather  than  manifest  variable  modeling,  then  the  immediate  strategy 
should  be  to  begin  work  on  the  confirmatory  factor  analysis  of  multiple  indicators  for 
latent  variables  rather  than  computation  of  composites.  In  either  case,  we  look  forward 
to  working  with  the  group  in  adapting  the  general  principles  described  here  to  specific 
analyses. 
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The  unmeasured  variables  problem  has  not  received  adequate  attention  in  ap¬ 
plications  of  path  analysis.  The  ramifications  of  inadequate  attention  to  this 
problem  are  addressed  in  respect  to  correlations  between  causal  variables  and 
the  errors  of  causal  equations  and  the  resulting  bias  in  solutions  of  path  coeffi¬ 
cients  The  discussion  recognizes  that  obviation  of  the  unmeasured  variables 
problem  is  an  unrealistic  objective.  Consequently,  logic  is  provided  in  the  form 
of  decision  steps  to  help  investigators  ascertain  whether  the  influence  of  un¬ 
measured  variables  that  can  be  expected  in  any  particular  analysis  is  of  suffi¬ 
cient  seriousness  to  preclude  the  use  of  path  analy  sis. 


In  their  review  of  path  analysis  studies. 
Billings  and  Wroten  ( 1 9T8 )  concluded  that 
many  biased  estimates  of  path  coefficients 
had  been  reported  in  the  industrial  organi¬ 
zational  literature.  A  primary  reason  cited 
was  that  relevant  causal  variables  had  not 
been  included  in  the  causa!  systems  investi¬ 
gated  This  unfortunate  practice  is  generally 
referred  to  as  the  unmeasured  (or  omitted) 
variables  problem  (Duncan.  19"5i.  The  rec¬ 
ommended  solution  to  the  unmeasured 
variables  problem  is  to  measure  reliably  all 
variables  that  are  causes  of  an  endogenous 
(dependent)  variable  and  are  correlated  with 
other  causes  of  that  endogenous  variable 
Regrettably .  in  most  cases  this  solution  is 
impossible  to  achieve,  if  for  no  other  reason 
than  that  all  relevant  causes  of  an  endog¬ 
enous  variable  might  not  even  be  known 
(Duncan.  1975:  Heise.  !9"5:  Kenny.  19"5i 
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Consequently .  the  operative  question  is  no-, 
whether  one  has  an  unmeasured  variable* 
problem  but  rather  the  degree  to  which  the 
unavoidab  unmeasured  variables  problem, 
biases  estimates  of  path  coefficients  ar.c 
provides  a  basis  for  altemativ  e  explanation  - 
of  results  (Fisher.  19~1.  Jame?  k  Smg' 
1978).  In  actual  practice,  it  is  not  uncorr, mot¬ 
to  allow  certain  trade-offs,  where  the  cos:* 
of  omitting  at  least  known  causes  from,  the 
causal  system  are  evaluated  in  terms  of  the:: 
importance  to  the  pv  eral)  sy  stem  and  the  ct  ■ 
gree  to  which  obtained  estimates  of  path  co¬ 
efficients  for  measured  causes  might  he 
biased.  Decision  rules  for  evaluating  the*c 
costs  have  up  to  now  remained  largely  enig¬ 
matic. 

This  article  has  two  objectives  The  fir.*: 
is  to  summarize  bnefiy  the  bases  for  the  un¬ 
measured  variables  problem  and  the  ram  ■ 
fications  of  this  problem,  namely,  biased 
solutions  of  path  coefficients.  The  second 
objective  is  to  provide  a  set  of  subjective 
decision  steps  that  identify  condition?  m 
which  an  unmeasured  variables  problem  is 
not  likely  to  bias  seriously  the  estimate? 
of  path  coefficients  for  measured  causes  In 
addition,  several  inaccuracies  are  noted  m 
regard  to  the  Billings  and  Wroten  ( 19'8  >  d:* 
cussion  of.  and  recommendations  for  solv¬ 
ing,  the  unmeasured  variables  problem 

The  discussion  below  focuses  on  the  ap¬ 
plication  of  path-analytic  procedures  to 
cross-sectional,  unidirectional  causal  models 
that  employ  nonexperimental  data.  Relat.v  e'.y 
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simple  models  are  employed  for  illustrative 
purposes,  and  all  variables  are  considered 
to  be  in  standardized  form.  With  the  excep¬ 
tion  of  the  unmeasured  variables  problem, 
all  other  assumptions  required  in  path  anal¬ 
ysis  are  assumed  to  be  satisfied  (e.g..  cor¬ 
rectly  specified  causal  order  and  direction, 
linearity,  interval  scales,  and  no  random 
measurement  error  in  independent  vari¬ 
ables).  In  the  elaboration  of  the  argument, 
it  was  necessary  to  focus  statistical  treat¬ 
ments  on  theoretical  path  equations  for 
populations.  The  term  solution  is  employed 
for  these  treatments  so  as  not  to  confuse 
them  w  ith  estimates  of  path  coefficients  pro- 
v  ided  by  multiple  regression,  that  is,  ordinary 
least  squares  (OLS).  However,  the  degrees 
of  bias  represented  in  the  solutions  w  ould  be 
the  same  as  those  represented  in  population 
OLS  estimates  (Sample  OLS  estimates  re¬ 
quire  the  addition  of  sampling  error.) 

The  Unmeasured  V  unable s  Problem  and 
Correlations  With  Error  Terms 

The  existence  of  an  unmeasured  variables 
problem  reflects  a  violation  of  an  important 
assumption  in  path  analysis.  This  assump¬ 
tion  is  that  the  causes  for  a  dependent  endog¬ 
enous  variable  are  uncorrelated  with  the 
error  term  (disturbance,  residual)  of  the 
causal  equation  for  that  endogenous  variable 
(as  well  as  the  error  terms  of  equations  for 


all  endogenous  variables  that  occur  later  in 
the  causal  order — Duncan.  1975;  Johnston. 
1972).  This  assumption  implies  that  the 
causal  variables  in  a  theoretic  path  equation 
should  be  unrelated  to  unmeasured  causes 
of  the  dependent  endogenous  variable  inas¬ 
much  as  the  unmeasured  causes  are  included 
in  the  error  term.  Satisfaction  of  the  as¬ 
sumption  is  a  necessary  but  not  a  sufficient 
condition  for  unbiased  solutions  of  pa’h  co¬ 
efficients  and  implies  further  that  the  error 
terms  of  different  path  equations  in  a  hierar¬ 
chical  system  of  equations  will  be  uncor¬ 
related  (Duncan.  1975). 

To  illustrate  the  issues.  Figure  la  display  <■ 
a  path  (causal)  mode!  in  which  X .  is  a  cause 
of  the  two  endogenous  variables.  X;  and  X  . 
X2  is  also  a  cause  of A'3.  The  u,  ( u:  and u3i  are 
error  terms  that,  based  on  the  assumptions 
above,  may  have  the  following  two  compo¬ 
nents;  unmeasured  causal  variables,  which 
will  be  labeled  by  Zs.  and  random  shocks 
(RSi.  which  are  unstable,  minor  causa!  in¬ 
fluences  that  are  generally  assumed  to  be  in¬ 
dependent  of  one  another.  The  P{  are  path 
coefficients,  defined  as  the  mean  change  (in 
standard  deviation  units)  in  a  dependent 
endogenous  variable  expected  to  result  from 
each  unit  of  change  in  a  causal  variable,  as¬ 
suming  all  other  causal  variables  in  an 
equation  are  held  constant  (Darlington  &. 
Rom.  19"2  . 

If  it  is  postulated  that  no  unmeasured 


Figure  I  Illustrations  of  unidirectional  causa!  models  with  specifications  on  the  error  terms 
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variables  are  present,  then  it  is  possible  to 
proceed  to  solve  for  the  path  coefficients  in 
Figure  la.  In  this  condition,  the  error  terms 
would  involve  only  the  RS  components, 
which  by  definition  cannot  be  reliably  mea¬ 
sured.  Thus,  the  assumption  that  the  causal 
variables  for  an  endogenous  variable  are  un¬ 
correlated  with  the  error  term  for  that  endog¬ 
enous  variable  would  be  satisfied,  which 
connotes  that  X ,  is  uncorrelated  with  u2 
and  uz ,  and  that  X2  is  uncorrelated  with 
uz.  Note,  however,  that  no  assumption  is  re¬ 
quired  that  Xz  be  uncorrelated  with  w:;  that  is. 
no  assumption  is  required  concerning  relation¬ 
ships  between  errors  and  endogenous  vari¬ 
ables  when  the  endogenous  variables  occur 
later  in  the  causal  order  than  the  errors 
(Duncan,  1975).  In  this  regard,  Billings  and 
Wroten  (1978)  are  inaccurate  when  they 
stated  the  assumption  in  the  following  man¬ 
ner  "the  residuals  of  endogenous  variables 
are  not  correlated  w  ith  one  another  or  with 
cr.\  other  endogenous  variables"  (p.  680. 
italics  added). 

Suppose  that  the  error  terms  are  not  com¬ 
prised  of  RS  components  exclusively,  but 
rather  that  an  unmeasured  causal  variable. 
Z.  is  present  in  both  error  terms.  Suppose 
further  thatZ  is  a  reliable  and  major  cause  of 
both  A':  and  Xz  and  is  correlated  with  A',. 
This  state  of  affairs  is  displayed  m  Figure  lb. 
where  the  error  terms  have  been  decom¬ 
posed  into  a  Z  component  and  RS  compo¬ 
nents  Of  initial  importance  is  the  fact  that 
the  errors  will  be  correlated  because  the 
same  Z  appears  in  both  error  terms  ( i.e. .  the 
curved  arrow  between  Z  for  X2  and  Z  for 
X3).  This  simple  example  demonstrates 
how  the  effects  of  unmeasured  variables 
could  lead  directly  to  a  violation  of  the  as¬ 
sumption  of  uncorrelated  errors. 

The  curved  arrows  from  the  (same!  Zs  to 
A",  reflect  correlation  between  Z  and  X ,  and 
connote  that  A",  will  be  correlated  with  the 
error  terms  for  both  X2  and  AY  Furthermore, 
because  Z  is  both  a  cause  of  X2  and  is  rep¬ 
resented  in  the  Xz  enor  term,  it  must  be  as¬ 
sumed  that  X2  is  correlated  with  the  error 
term  for  AY  Thus,  all  possible  assumptions 
regarding  correlations  between  causes  and 
error  terms  may,  at  this  time,  be  regarded  as 
violated.  The  ramifications  of  this  condition 
are  discussed  below. 


Biased  Solutions  for  Path  Coefficients 

Suppose  that  the  path  model  displayed  in 
Figure  lb  is  operable  but  that  an  investigator 
assumed  incorrectly  that  the  error  terms 
were  comprised  of  RS  components  only 
The  investigator  could  solve  for  the  path  co¬ 
efficients,  but  they  would  likely  be  biased 
To  illustrate  the  bias,  a  false  model  ( Z  not  in¬ 
cluded)  is  compared  to  a  true  model  (Z  in¬ 
cluded)  to  determine  the  consequences  of 
employing  the  false  model  to  solve  for  the 
path  coefficients  (Duncan,  1975).  For  ex¬ 
ample.  based  on  Figure  lb,  the  path  equa¬ 
tion  for  X2  in  the  false  model  is 

X2  =  p:iA\  *  us.  ( 1  • 

in  which  u2  is  incorrectly  assumed  to  be 
comprised  of  only  RS  components.  The 
normal  equation  required  to  solve  fo r p2:  is 
simply 

ru  =  Pa.  <2: 

which  connotes  that  the  A',  — *  X2  path  coef¬ 
ficient  is  equal  to  the  zero-order  correlation 
coefficient.  The  true  path  equation  for  A. 
assumes  that  Z  is  measured  and  is 

A':  =  p'uXt  -  p'uZ  -  RS:.  (?■ 

in  which  primes  are  employ  ed  to  designate 
path  coefficients  in  the  true  model. 

To  determine  the  bias  resulting  from  em¬ 
ployment  of  Equation  1  rather  than  Equation 
3  to  solve  for  the  A'.  — •  ,V;  path  coeffi¬ 
cient.  we  multiply  through  Equation  3  by 
AY  take  expectations,  and  express  the  re¬ 
sults  in  terms  of  correlations.  The  result  is 

r2i  =  P  z:  ~  P  ur.z  <■* 

Comparison  of  Equation  2  w  ith  Equation  4 
suggests  that  the  use  of  r2 ■  to  solve  for 
p  Y  in  the  false  model  results  in  a  bias  equal 
to  p'urz\-  That  is, 

rZ\  ~  Pz\  ~P  21  +  P  UrZ\-  (-  1 

and  thus  p2l  differs  from  p  Y  a  factor  of 
P  UTl\- 

These  derivations  suggest  directly  that 
p21  will  be  biased  if  both  p’u  and  rz.  are 
greater  than  zero.  In  other  words,  if  the  un¬ 
measured  Z  is  a  cause  of  X2  and  correlated 
with  AY  then  pu  will  be  biased.  If  we  disre 
gard  suppressors,  then  the  bias  will  be  in  the 
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direction  of  a  p2,  that  is  too  large;  this  is  a  di¬ 
rect  result  of  failure  to  control  for  the  effects 
of  Z  in  solving  for  p2v 

It  is  extremely  important  to  note,  how¬ 
ever.  that  if  either  p'-a  or  ri\  >s  zero,  or  ap¬ 
proximately  zero,  then  little  or  no  bias  will 
exist  in  p21.  This  suggests  that  bias  will  not 
occur  if  an  unmeasured  variable  is  in  fact  a 
cause  of  the  dependent  endogenous  vari¬ 
able  but  is  unrelated  to  the  measured  causes 
of  the  same  variable.  Consequently,  it  is  not 
necessary  to  assume  that  all  major  causes  of 
a  dependent  endogenous  variable  have  been 
measured.  Rather,  an  unmeasured  cause 
must  also  be  correlated  with  the  measured 
causes  before  bias  will  ensue. 

It  is  also  important  to  recognize  that  there 
are  degrees  of  causation;  the  magnitude  of 
p  u  might  be  anywhere  on  a  continuum  from 
low.  to  moderate,  to  high.  Similarly,  the 
magnitude  of  r2-  may  vary  from  zero,  or  ap¬ 
proximately  zero,  to  low.  moderate,  or  high. 
Clearly .  the  product  termp'^r*.  may  assume 
many  permutations,  only  some  of  which  are 
likely  to  result  in  senous  bias  of  the  solution 
of  the  path  coefficient.  For  pragmatic  pur¬ 
poses.  it  is  assumed  that  those  most  likely  to 
lead  to  serious  bias  are  high-high,  mode¬ 
rate-high.  high-moderate,  and  moderate- 
moderate.  Consequently,  ar,  unmeasured 
variables  problem  does  not  necessarily  have 
to  result  in  seriously  biased  solutions  of  the 
path  coefficients  It  is  with  the  question  of 
degree  that  the  investigator  (or  critic)  should 
be  concerned.  However,  whenever  Z  is  un¬ 
measured.  this  is  necessarily  a  subjective 
process  (possible  empirical  procedures  are 
addressed  later). 

An  unmeasured  variables  problem  will 
also  not  result  in  seriously  biased  solutions 
of  path  coefficients  if  an  unmeasured  cause 
is  correlated  highly  with  a  measured  cause. 
This  can  be  demonstrated  by  remembering 
that  the  path  coefficients  involve  controls 
for  the  other  causal  variables  in  a  path  equa¬ 
tion.  For  example,  consider  the  path  equa¬ 
tion  if  the  unmeasured  Z  is  included  theoret¬ 
ically  in  the  A'2  equation.  This  equation  is 

Xt  —  Pu  ijiA'i  ■+•  (Pi2  |Z)  +  u2,  (6) 

in  which  the  parentheses  connote  theoretical 
inclusions. 

The  path  coefficient  for  the  unmeasured  Z 


would  be  approximately  equal  to  zero  if  Z 
andA\  were  correlated  highly  (e.g..  .95 j  and 
a  control  for  A",  were  effected.  Consequently . 
there  is  no  reason  to  include  Z  in  the  equa¬ 
tion  because  it  is  essentially  redundant  with 
Xt  (note  also  that  inclusion  of  Z  would  re¬ 
sult  in  a  multicollinearity  problem).  More¬ 
over.  withZ unmeasured, essentially  no bias 
will  ensue  for  the  p2,  path  coefficient  li.e.. 
Pu  \rz\  =  0  because  p^  ,  s  0). 

The  illustration  above  identifies  two  ex¬ 
tremely  important  and  related  issues  tha: 
should  always  be  considered  in  relation  to 
unmeasured  causes.  First,  before  an  un¬ 
measured  cause  is  likely  to  create  bias  in  the 
path  coefficients  for  measured  causes  wi:h 
which  it  is  correlated,  it  must  make  a  unique 
contribution  to  the  prediction  of  the  de¬ 
pendent  endogenous  variable.  That  is.  it 
must  predict  meaningfully  the.  dependent 
endogenous  variable  after  controls  are  ef¬ 
fected  for  the  measured  causes.  Second. the 
preceding  point  can  be  viewed  from  the 
standpoint  of  redundancy  and  linear  de¬ 
pendence.  If  a  know  n  but  unmeasured  cause 
is  essentially  redundant  (highly  correlated 
with  a  measured  cause,  then  there  is  no 
reason  to  assume  that  the  unmeasured  van- 
able  will  create  senous  bias  in  the  path  co¬ 
efficient  for  the  measure^  variable.  More¬ 
over.  the  unmeasured  caus.  need  not  simply 
be  redundant.  In  more  complex  models  in¬ 
volving  multiple  causes,  it  is  sufficient  that 
the  unmeasured  cause  be  essentially  linearly 
dependent  on  the  measured  causes.  A  heuns- 
tic  consequence  of  this  logic  is  tha;  as  the 
number  of  measured  causes  increases,  the 
likelihood  of  an  unmeasured  variables 
problem  decreases.  That  is .  even  though  un¬ 
measured  causes  exist,  they  are  increasingly 
likely  to  be  linearly  dependent,  or  approxi¬ 
mately  so.  on  the  measured  causes  as  the 
number  of  measured  causes  increases.  Thus, 
it  is  possible  to  have  unmeasured  causes  and 
yet  have  no  serious  unmeasured  variables 
problem 

The  logic  developed  above  for  a  com¬ 
paratively  simple  case  of  bias  transfers  di¬ 
rectly  to  more  complex  cases,  although  in 
more  complex  cases  the  direction  of  bias 
may  be  either  positive  or  negative.  For  ex¬ 
ample,  serious  bias  in  the  solutions  for  either 
Pi i  or  p32  in  Figure  lb  is  unlikely  if  one  of 
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the  following  conditions  exists:  (a)  Z  is  only  a 
minor  cause  ofAf3,  (b )Z  is  not  a  unique  cause 
ofA’j  (e.g.,Z  is  linearly  dependent  on  A',  and 
A'2i.  or  (c)  Z  has  low  correlations  with  A", 
and  A'j.  Space  limitations  preclude  statis¬ 
tical  development  of  the  more  complex  case , 
and  the  reader  is  referred  to  an  analogous 
development  based  on  unstandardized  vari¬ 
ables  in  Duncan  (1975,  chap.  8). 

Decision  Steps  for  Assessing  the  Seriousness 
of  Unmeasured  Variables  Problems 

Although  it  is  unrealistic  to  expect  obvia¬ 
tion  of  the  unmeasured  variables  problem  in 
research,  it  is  possible  under  specified  con¬ 
ditions  to  attempt  to  minimize  bias  in  path 
coefficients  to  the  point  that  the  bias  is 
within  "tolerable  limits"  for  research  pur¬ 
poses.  In  the  interest  of  identify  ing  such 
tolerable  limits,  salient  points  from  pnor 
discussion  are  summarized  below  in  the 
form  of  decision  steps  that  are  designed  to 
help  investigators  ascertain  whether  an  un¬ 
measured  vanables  problem  is  sufficiently 
serious  to  preclude  the  use  of  path  analysis 
Presentation  of  the  decision  steps  must  be 
prefaced,  however,  with  the  caution  that 
many  of  the  decisions  require  subjective 
judgments  and  the  need  to  make  empirically 
untestable  assumptions. 

The  decision  steps  are  written  from  the 
standpoint  of  one  endogenous  variable,  al¬ 
though  the  steps  should  be  applied  to  each 
endogenous  variable  in  a  causa!  model.  Fur¬ 
thermore.  the  decision  steps  should  be  em¬ 
ployed  only  when  investigators  have  a  rea¬ 
sonably  high  degree  of  confidence  in  the 
causal  closure  and  stability  of  a  causal 
model.  Howes er.  the  possibility  that  the 
model  might  change  in  the  future  as  new 
causes  are  discovered  should  be  clearly  rec¬ 
ognized.  This  is  not,  however,  sufficient 
reason  to  preclude  proceeding  with  causal 
analyses,  given  that  no  attempt  is  made  to 
suggest  that  the  present  causal  model  is  un¬ 
ambiguously  unique  or  correct. 

The  decision  steps  are  as  follows: 

Step  1 .  Attempt  to  identify  known  major 
and  moderate  causes  of  the  endogenous 
variable. 

If  data  have  not  been  collected,  then  at¬ 
tempt  to  measure  the  major  moderate 


causes,  unless  there  appears  to  be  a  good 
reason  not  to  include  one  or  more  of  these 
variables,  as  determined  in  Step  2. 

If  data  have  already  been  collected,  then 
attempt  to  identify  known  major  moderate 
unmeasured  causes.  If  one  or  more  such 
causes  is  believed  to  exist,  proceed  to  Step 
2.  If  no  major  moderate  unmeasured  causes 
are  believed  to  exist,  then  exit  from  the  de¬ 
cision  steps  at  this  point  (i.e..  a  serious  un¬ 
measured  variables  problem  appears  to  be 
unlikely  for  this  endogenous  variable,  at 
least  from  the  perspective  of  the  decision 
maker). 

Step  2.  Postulate  whether  each  major 
moderate  unmeasured  cause  is  correlated 
with  one  or  more  of  the  measured  causes, 
using  prior  empirical  evidence  whenever 
possible.  In  designing  a  path  analy  sis  study . 
this  step  and  those  that  follow  are  meant  to 
be  viewed  in  terms  of  causes  that  are  not  as 
yet  in  the  causal  model,  as  compared  to 
causes  already  included  in  the  model 

If  the  correlations  between  an  unmeasured 
cause  and  all  of  the  measured  causes  are 
presumed  to  be  low  (e.g. .  0  to  =  .20.  although 
this  is  arbitrary  ),  then  exit  here  for  that  un¬ 
measured  cause.  Note,  however,  that  if  a 
different  unmeasured  cause  is  included  later 
in  the  causal  model,  then  the  decisions  re¬ 
garding  pnor  unmeasured  causes  should  be 
reevaluated;  this  applies  to  all  of  the  fol¬ 
lowing  steps.  Furthermore,  an  exit  at  this 
point  suggests  that  the  explanatory  power  of 
the  causal  model  in  regard  to  the  endog¬ 
enous  variable  of  interest  will  be  reduced 
On  the  other  hand,  if  the  judgment  is  correct 
that  ail  correlations  between  the  unmeasured 
cause  and  the  measured  causes  are  low .  then 
the  solutions  of  the  path  coefficients  for  the 
measured  causes  are  not  likely  to  be  seri¬ 
ously  biased. 

If  an  unmeasured  cause  is  believed  to 
have  a  moderate  to  high  correlation  w  ith  one 
or  more  of  the  measured  causes,  then  con¬ 
sider  whether  the  unmeasured  cause  is  es¬ 
sentially  redundant  with  one  of  the  measured 
causes  or  essentially  linearly  dependent  or. 
some  combination  of  the  measured  causes 
If  prior  research  and/or  judgment  allow  one 
to  have  confidence  in  an  affirmative  response 
to  one  of  these  considerations,  then  exit  a: 
this  point.  Note  again,  however,  that  al- 
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though  the  exit  suggests  lack  of  serious  bias, 
this  will  occur  only  if  the  judgments  are 
correct. 

Step  3.  By  reaching  Step  3,  it  has  been 
decided  that  (a)  at  least  one  unmeasured 
major- moderate  cause  exists  for  the  endog¬ 
enous  variable  of  interest,  (b)  the  unmea¬ 
sured  cause  is  correlated  at  least  moderately 
with  one  or  more  of  the  measured  causes, 
and  (c)  the  unmeasured  cause  is  neither  re¬ 
dundant  with  one  of  the  measured  causes  nor 
linearly  dependent  on  some  combination  of 
the  measured  causes.  This  suggests  that  a 
serious  unmeasured  variables  problem  exists 
and  that  an  attempt  to  solve  for  the  path  co¬ 
efficients  for  this  endogenous  variable  based 
on  the  measured  causes  is  likely  to  result  in 
at  least  one  seriously  biased  solution.  Con¬ 
sequently.  it  is  recommended  that  path- 
analytic  procedures  not  be  employed  for  this 
endogenous  variable  until  the  unmeasured 
causes  are  in  fact  measured.  (A  less  desirable 
possibility  might  be  to  delete  measured 
causes  that  are  presumed  to  be  correlated 
with  unmeasured  causes.) 

It  should  be  mentioned  that  in  a  causal 
model  involving  multiple  endogenous  vari¬ 
ables.  it  is  possible  to  have  serious  unmea¬ 
sured  variables  problems  for  one  or  more 
endogenous  variables  but  not  for  other  en¬ 
dogenous  variables  (Duncan.  1975.  pp. 106- 
10"j  It  is  possible,  therefore,  to  employ 
path-analytic  procedures  for  only  those  en¬ 
dogenous  variables  without  a  serious  un¬ 
measured  variables  problem,  although  this 
is  not  a  highly  desirable  state  of  affairs  in¬ 
asmuch  as  only  part  of  a  causal  system 
would  be  addressed. 

Discussion  and  Conclusions 

In  concluding,  several  additional  points 
should  be  commented  on  briefly.  First, 
when  unidirectional  path  models  are  based 
on  variables  collected  at  only  one  point  in 
time,  no  method  is  presently  available  to 
assess  empirically  whether  an  unmeasured 
variables  problem  exists,  using  the  data  at 
hand.1  The  controlling  rule  is  that  assump¬ 
tions  that  moderate/ major  unmeasured 
causes  are  essentially  uncorrelated  with,  or 
are  redundant  with/linearly  dependent  on. 
measured  causes  must  be  regarded  as  having 


been  reasonably  satisfied  before  OLS  or 
other  forms  of  estimation  (e  g.,  maximum 
likelihood)  are  employed  to  estimate  the 
path  coefficients  in  either  a  population  or  a 
sample  (cf.  Duncan,  1975). 

Second,  other  empirical  approaches  are 
available  to  assess  whether  an  unmeasured 
variables  problem  exists  and  to  attempt  to 
eliminate  bias  created  by  unmeasured 
causes.  These  include  time-series  analysis, 
instrumental  variables,  and  two-stage  least 
squares  (cf.  Heise,  1970,  1975;  James  & 
Singh,  1978;  Johnston,  1972;  Joreskog. 
1978).  On  the  other  hand.  I  must  caution 
against  the  Billings  and  Wroten  (19"8i  rec¬ 
ommendation  that  rejection  of  the  hypoth¬ 
esis  of  spuriousness  in  cross-lagged  panel 
correlation  (XLPC)  analy  sis  implies  the  ab¬ 
sence  of  an  unmeasured  variables  problem 
in  a  cross-sectional  path  analysis.  Assume, 
for  example,  that  an  unmeasured  Z  has 
unique,  moderate  causal  effects  on  two 
measured  variables.  A';  and.Y:.  Assume  fur¬ 
ther  that  .Y,  is  a  moderate  cause  of  AC  after  a 
control  is  effected  for  Z.  In  an  XLPC  anal¬ 
ysis  involving  only  the  measured  A’,  and 
AC.  the  hy  pothesis  of  spunousness  i  Kenny . 
1975)  would  likely  be  rejected  because  A',  is 
a  moderate  cause  of  A'j.  Follow, ing  the  logic 
of  Billings  and  Wroten.  this  suggests 
that  the  Xz  path  equation  ( i.e . .  A':  =  p_ 
X,  -  u j)  does  not  have  an  unmeasured  vari¬ 
ables  problem.  But  this  is  incorrect.  The 
XLPC  analysis  demonstrated  only  that  the 
A',  and  X2  relationship  was  not  completely 
determined  by  Z.  In  the  cross-sectional  path 
equation  for  X2.  if  Z  remains  unmeasured 
and  therefore  a  control  for  Z  is  not  effected 
for  p2 1.  then  that  path  coefficient  will  be 
biased  ( i.e.,  based  on  the  assumptions  gi\ en. 
Z  is  a  unique,  moderate  cause  of  AC  and  is 
correlated  at  least  moderately  with  A',). 

Third,  and  finally ,  as  discussed  by  Billings 
and  Wooten  (1978).  and  as  implied  in  the 
decision  steps,  the  unmeasured  variables 
problem  can  be  at  least  partially  negated  by 
attending  first  to  effects  and  then  to  causes 
Better  yet,  however,  is  to  base  the  initial 
identification  of  effects  and  causes  on  a 


1  This  statement  should  not  be  confused  with  tests  of 
logicaJ  consistency,  in  which  variables  are  in  fact  mea¬ 
sured  but  not  included  in  specific  path  equations 
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logical,  reciprocal  interaction  between  ef¬ 
fects  and  causes.  Specifically,  if  one  wishes 
to  examine  only  specific  causes  (e.g..  leader 
behaviors  in  the  context  of  path -goal 
theory),  then  it  is  desirable,  if  possible, 
carefully  to  refine  the  effects  so  that  they 
reflect  only  the  causal  variables  of  interest. 
Not  only  will  this  procedure  reduce  the 
likelihood  of  an  unmeasured  variables  prob¬ 
lem,  but  it  might  also  provide  a  much 
needed  stimulus  for  more  thoughtful  cri¬ 
terion  research. 
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Variables  and  Individual  Differences 
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Statistical  rationale  is  presented  for  relating  situational  variables  le  g  . 
technological  complexity i  to  person  variables  (e  g  .  environmental  percep¬ 
tions  attitudes i .  A  procedure  is  described  wherein  correlations  are  determined 
between  a  person  variable  and  one  or  more  situational  variables  after  the 
scores  on  the  situational  variables  have  been  assigned  to  individuals  The 
results  of  the  procedure  provide  opportunities  to  ascertain  (a)  the  degree  to 
which  variation  among  individuals  on  a  person  variable  is  associated  with 
situational  differences,  and  (b)  the  degree  to  which  a  situational  variable  ac¬ 
counts  for  the  total  possible  variation  in  the  person  variable  that  is  associated 
with  between-group  differences 

The  degree  to  which  individual  differences  in  factors  such  as  climate 
perceptions,  attitudes,  and  behaviors  are  associated  with  differences  in 
work  situations  has  received  increasing  attention  (cf..  Adams.  Laker.  A 
Hulm.  1977.  Herman  A  Hulirt.  1972:  Herman.  Dunham.  A  Huhn.  1975. 
James  &  Jones,  1976.  Jones  &  James.  1979.  Lawler.  Hall.  &  Oldham. 
1974;  Mowday.  Porter,  A  Dubin.  1974;  Newman.  1975;  O'Reilly  A 
Roberts,  1975;  Payne  A  Mansfield.  1973;  Payne  A  Pugh.  1976.  Roberts. 
Hulin.  A  Rousseau.  1978;  Rousseau,  1977,  1978a.  1978b;  Stone  A  Porter. 
19"’5)  Estimates  of  person -situation  associations  are  frequently  based  or, 
“between-group"  analyses,  where  membership  in  a  particular  situation 
(e  g  .  job  type,  work  group,  functional  specialty,  organization)  is  used  as 
the  independent  variable  (dummy  variables  in  multiple  regression,  clas¬ 
sification  factors  in  ANOVA  and  multiple  discriminant  analysis),  and 
scores  on  one  or  more  individual  difference  variables,  or  person  variables 
(PVs).  are  employed  as  the  dependent  variables.  Using  various  forms  of 
the  general  linear  model,  estimates  of  variance  accounted  for  in  the  PV<  s  i 
by  “group  membership"  (e  g.,  membership  in  different  organizations)  is  re- 
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ported  in  the  to. in  of  an  .ta-square,  omega-square,  intraclass  correlation, 
squared  multiple  correlation,  or  multivariate  analogs,  such  as  a  redun¬ 
dancy  coefficient. 

While  this  type  of  analysis  reflects  the  amount  of  variation  in  one  or 
more  PVs  associated  with  group  membership,  it  is  also  the  case  that  the 
independent  variable — group  (situation) — typically  does  not  identify  spe- 
cific  aspects  of  the  situations  represented  that  are  associated  with  the 
variations  in  the  PV  (Firebaugh,  1979;  James  &.  Jones,  1976).  This  state 
ment  is  perhaps  more  applicable  to  extremely  general  between-group  des¬ 
ignators  (e.g..  work  group,  without  reference  to  type  of  work  group)  thar. 
to  more  specific  between-group  designators  (e.g.,  job  type  or  functional 
specialty).  Nevertheless,  a  between-group  designator  such  as  job  type  is 
only  an  indirect  indicator  of  specific  situational  variables,  such  as  jot 
complexity.  role  requirements,  and  reward  structure. 

Recently.  emphasis  has  been  placed  on  measuring  specific  situational 
variables  and  relating  these  variables  to  PVs  (cf . .  Jones  &.  James.  1 9“ 9 . 
Rousseau.  1978b i.  For  example,  in  each  of  these  studies,  measures  of 
specific,  subunit  situational  variables  (e.g..  technology,  and  centralization 
and  formalization  of  structure  for  divisions/departments)  were  related  to 
individuals'  perceptions  of  job  characteristics.  The  analytic  procedure 
was  also  the  same:  all  individuals  in  a  particular  division  (department- 
were  assigned  the  same  scores  on  the  situational  variables,  and  then  the 
situational  scores  were  correlated  with  individuals'  perceptions  of  job 
characteristics  (the  PVs)  on  the  individual  sample.  It  is  important  to  note 
that  i  a  i  the  desired  level  of  analy  sis  in  both  studies  was  the  individual,  and 
(b *  it  was  assumed  that  the  situational  variables  were  homogeneous  for  all 
individuals  in  a  particular  division  or  department  (see  Roberts  et  a!.,  19~b. 
pp  106-  10*.  for  a  discussion  of  homogeneity). 

The  information  provided  by  relating  specific  situational  variables  to 
PVs.  follow  ing  assignment  of  the  situational  scores  to  individuals,  should 
be  superior  to  the  information  provided  by  between-groups  analysis  be¬ 
cause  the  investigator  now  has  an  empirical  basis  for  attempting  to  explain 
what  it  is  about  work  environments  that  is  associated  with  the  PVs  (James 
&.  Jones.  1976;  Roberts  et  a!..  1978;.  However,  it  is  also  the  case  that  this 
procedure  will  likely  result  in  a  loss  of  predictive  power  in  companson  to 
the  between-groups  procedure,  which  employs  only  group  membership  as 
a  predictor.  This  is  because  the  between-groups  procedure  identifies  a!! 
reliable  variation  in  a  PV  that  is  associated  with  between-group  differ¬ 
ences.  while  the  use  of  specific  situational  variables  generally  involves 
only  a  subset  of  the  variables  that  are  associated  with  between-group 
differences  in  the  PV.  Thus,  a  salient  question  is:  To  what  extent  does  the 
magnitude  of  the  relationship  between  one  or  more  situational  variables 
and  a  PV  approach  the  magnitude  of  the  relationship  between  the  PV  and 
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br*"reen-group  differences  (for  the  aample  studied)?  In  effect,  this  ques¬ 
tion  may  be  viewed  as  one  which  asks  for  an  assessment  of  the  degree  to 
which  reliable  between-group  variance  in  a  PV  remains  to  be  accounted 
for  after  the  measured  situational  variables  are  considered. 

The  primary  objective  of  this  article  is  to  present  a  statistical  rationale 
for  relating  a  PV  to  one  or  more  situational  variables  and  for  determining 
the  extent  to  which  the  obtained  relationship  approaches  the  (maximum’ 
variation  in  a  PV  that  is  associated  with  between-group  differences.  Uni¬ 
variate  procedures  are  presented  initially,  primarily  to  simplify  the  dis¬ 
cussion.  and  are  followed  by  an  extension  to  the  multivariate  case.  An 
empirical  illustration  is  presented. 

Statistical  Rationale  for  Relating  Situational  Variables  to  Person 

Variables 

For  illustrative  purposes,  the  following  conditions  were  assumed 

(1) 5,  is  a  continuously  distributed  situational  variable  (e.g.,  technologi¬ 
cal  complexity),  on  which  each  of  k  groups  (e.g.,  job  types,  work  groups, 
departments,  divisions,  organizations)  has  a  unique  score  (t  =  1,2.  ...  it), 
although  some  groups  may  have  the  same  score  as  other  groups.  When  all 
individual  members  of  the  same  group  are  assigned  the  same  value  of  5 
for  that  group,  the  designation  5U  is  used,  where  j  represents  the  jib 
individual  in  a  group  comprised  of  n,  individuals  (j  -  1.2.  ..  .  n,).  It  is 
assumed  that  the  5,  for  each  group  is  homogeneous  for  all  n,  individuals 

(2)  Yc  is  theyth  individual's  score  in  the  ith  group  on  the  person  variable 
(PV).  Note  especially  that  the  iL  are  not  constrained  to  be  equal  for  all  >: 
individuals  in  the  ith  group. 

When  the  Su  and  >u  are  each  expressed  in  grand-mean  deviation  form 
(V  yt— deviations  from  the  grand  mean  of  all  individuals  across  al 
groups),  the  correlation  between  5U  and  can  be  expressed  as  folio*  <- 


where  A  =  2  n> •  This  equation  may  be  expressed  in  somewhat  differer: 
form  by  noting  that  all  in  group  i  are  identical  and  thus  V  =  n,  s 
(and  V  Su*  =  n,  s*),  and  that  2  y0  «=  n,  y,.  Hence,  ~ 

>  i 

^-f I«. 5'  j.)  A'f(l/A')(llyu«)1,':  n 

i  -  (  i  j  '  J 
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<IW)(ln,  *.*)]  "*  . 

=  /(ct^  <r,|).  (2) 

Equations  (1)  and  (2)  demonstrate  that  the  correlation  between  Su  and 
Yu  on  the  total  individual  sample— r„ — is  a  function  of  (a)  the  covanance 
between  the  weighted  group  means  on  the  PV  and  each  group's  score  on 
the  situational  variable,  and  (b)  the  standard  deviations  of  the  PV  and  the 
situational  variable  on  the  total  individual  sample.  Of  interest  are  the  facts 
that  ry,  will  only  achieve  absolute  values  greater  than  zero  when  (a)  varia¬ 
tion  exists  among  the  group  S,  scores,  (b)  there  is  comparatively  more 
between-group  variation  in  the  JVs  than  within-group  variation,  and  to 
the  group  mean  Yu's  covary  with  the  S,.  The  differential  weighting  due  to 
different  n,  may  become  a  confounding  factor  if  the  group  n,  are  substan- 
tiallv  different  (i.e.,  larger  groups  will  have  stronger  effects  on  ryi).  and 
thus  caution  should  be  used  when  analyses  employ  groups  with  large 
differences  in  group  sample  sizes.  Nevertheless.  Eq.  (2)  makes  clear  the 
fact  that  rv,  reflects  the  extent  to  which  group  (mean)  differences  on  a  PV 
tend  to  covan  with  group  scores  on  a  situational  variable,  relative  to 
individual  variation  on  the  PV  and  between-group  variation  on  the  situa¬ 
tional  variable.  This  connotes  that  rv,  may  be  interpreted  as  an  associctior 
between  a  situational  variable  and  a  PV. 

Our  next  concern  is  the  degree  to  which  r„s,  approaches  the  maximum 
variation  in  the  PV  that  is  associated  with  between-group  differences.  To 
achieve  this  goal,  it  is  necessary  to  determine  both  the  total  amount  of 
variation  in  the  PV  that  is  associated  with  between-group  differences  and 
that  portion  of  this  total  variation  that  is  associated  with  differences  in  5, 
The  determinations  of  these  variations  are  rather  easily  achieved  by  first 
deriving  an  equation  for  the  correlation  between  the  5,  and  the  weighted 
group  means  on  Yu.  This  correlation,  designated  rs, ,  is  as  follows 


=  <rPili  /  (aFi  <rfi),  <?  ’ 

where  <jit,  is  the  covariance  between  the  weighted  group  means  on  the  y  L 
and  the  situational  variable  s,\ o6x  is  the  standard  deviation  of  the  weighted 
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group  means  ony0:  and  a,t  is  the  standard  deviation  of  the  s,  in  the  total 
sample. 

Comparison  of  Eq  (3)  with  Eq.  (2)  demonstrates  that  the  nurne-au-is 
are  the  same,  as  are  the  standard  deviations  for  the  situational  variable  in 
the  denominator.  However,  the  remaining  terms  in  the  denominators.  ok 
(Eq  (2))  and  aSi  (Eq.  (3)),  can  generally  be  assumed  to  be  unequal  given 
that  the  PV— .y„ — would  usually  be  expected  to  vary  among  individuals  in 
the  same  group. 

If  Eqs.  (2)  and  (3i  are  each  sol'-e'4  for  the  covariance  terms,  and  the 
solutions  set  equal  to  each  other,  we  have  the  following 

r»l  CTI,  =  ri>  °i,  &t. 

which,  after  solving  for  ryt  and  squaring  all  terms,  is 

Furthermore.  .cy  1  is  rjf.  the  squared  correlation  ratio  (eta  square'  of 
on  group  membership.  Thus,  Eq.  (4)  is 

*  r,*  rtf.  (Ji 

where  rk?  is  the  proportion  of  the  variance  in  a  PV  associated  with  situa¬ 
tional  variable  5,;  tj;  is  the  total  amount  of  variation  in  the  PV  that  is 
associated  with  between-group  differences;  and  r*’  is  the  variance  in  the 
weighted  group  mean  PV  scores  that  is  associated  with  differences  in  the 
situational  variable  5,. 

Viewed  from  another  perspective,  t?/  is  the  maximum  possible  variation 
in  the  PV  that  is  associated  with  between-group  differences,  rj  will  be 
equal  to  r,l  only  in  the  condition  that  rrf  =  1.0,  which  can  be  seen  in  Eq 
(5).  Note  that  rk{  will  be  less  than  1.0,  and  therefore  rr]  <  tj*.  when  (a)  the 
relationship  between  the  >,  and  j,  is  nonlinear,  andbor  (b)  between-group 
variation  exists  in  the  y,  that  is  not  associated  with  j,  (see  Eq.  (3)).  As¬ 
suming  relationships  to  be  linear,  which  can  be  checked  empirically,  we 
see  that  rt f  represents  the  proportion  of  variation  in  tjJ  that  is  included  in 
rkJ  In  other  words,  rsf  indicates  the  degree  to  which  the  obtained  ry } 
approaches  the  maximum  possible  variation  in  a  PV  associated  with 
between-group  differences. 

This  is  seen  simply  by  convening  Eq.  (5)  to 

ri*  =  (6) 

It  is  imponant  to  note  that  meaningful  interpretation  of  r requires 
careful  attention  to  the  values  of  rt;  and  t j*  inasmuch  as  rkf  may  assume 
high  values  that  are  essentially  meaningless,  To  illustrate,  if rkJ  =  .OlOand 
Tj-  =  .011.  then  rsf  =  .909  The  value  of  .909  suggests  that  r„ \  did  in  fact 
approach  the  maximum  possible  variation  in  Y  associated  with  between- 
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group  differences,  as  reflected  by  tj*.  However,  an  17/  of  .011  indicates 
that  essentially  none  of  the  variation  in  Y  was  associated  with  between- 
grcup  differences  in  the  first  place.  That  is,  the  variation  in  Y  almost 
exclusively  is  within-group  variance,  and  all  that  the  rt\  of  .909  indicates 
is  that  one  has  accounted  for  approximately  91 9c  of.  in  effect,  nothing 

On  the  other  hand,  tj*  (and  r„f)  may  assume  reasonably  high  values,  in 
which  case  the  information  provided  by  Eq.  (6)  is  salient.  A  straightfor¬ 
ward  use  of  this  information  would  be  to  ascertain  whether  additional 
situational  variables  should  be  added  to  a  study  in  the  interest  of  ac¬ 
counting  for  reliable  variance  that  still  remains  between  groups.  That  is.  1 
-  rj*  indicates  the  proportion  of  between-group  variation  in  the  PV  that  is 
not  accounted  for  by  the  situational  variable  J,.  If  1  -  rs ;  is  not  equal  10 
zero,  then  the  indication  is  that  additional  situational  variables  are  needed 
in  the  analysis. 

The  inclusion  of  additional  situational  variables  in  the  analysis  means 
that  the  univariate  correlation  analysis  will  be  replaced  with  a  multiple 
correlation  analysis.  The  transfer  to  a  multiple  correlation  paradigm  is 
easily  achieved:  the  preceding  logic  extends  directh  to  multiple  correia 
tier,  analyses  based  on  two  or  more  situational  variables  which  have  the 
same  values  tor  all  individuals  in  each  group.  That  is,  it  can  be  show  n  tha : 
the  squared  multiple  correlations  are  related  by  Rl  =  rjl  Rl .  where  R¥ 
represents  the  squared  multiple  correlation  between  one  PV  and  two  or 
more  continuously  distributed  situational  variables.  Using  the  same  log  ; 
as  above.  R{.  the  squared  multiple  correlation  between  the  weighter* 
group  means  on  the  PV  and  the  situational  variables  indicates  the  degree 
to  which  R-  approaches  the  maximum  variation  in  the  PV  that  is  as 
sociated  with  between-group  differences,  as  reflected  by  rj- . 

It  is  notew orth>  that  some  portion  of  the  between-group  variation  re¬ 
flected  b>  t)1  might  not  be  limited  to  strictU  situational  attributes.  For 
example,  a  par.  of  the  between-group  variation  in  the  PV  might  reflee: 
group  mean,  differences  in  individual  difference  variables  such  as  age. 
education,  experience,  socioeconomic  status,  and  so  forth.  This  suggest1- 
that  Ri  might  not  achieve  a  value  of  1.0  by  adding  only  situational  vari¬ 
ables  to  the  analysis.  Consequently,  it  would  be  informative  to  ascertain 
whether  group  mean  differences  on  individual  difference  variables  ac¬ 
count  for  between-group  variation  in  the  PV  before  effort  is  extended  tc 
identify  additional  situational  variables  to  include  in  the  analysis. 

The  most  straightforward  approach  for  addressing  this  issue  is  to  com 
pute  group  means  on  the  individual  difference  variables  that  are  believed 
to  be  related  to  group  differences  (i.e.,  explain  between-group  variation  in 
the  PV)  and  to  treat  the  means  statistically  as  if  they  were  situational 
variables  (i.e.,  assign  the  group  means  for  a  group  to  all  individuals  in  the 
group).  In  effect,  the  analytic  procedure  would  consist  of  regressing  the 
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PV  on  both  the  measured  situational  variables  and  the  group  means  of  the 
individual  difference  variables  (IDVs),  following  the  procedures  outlined 
for  situational  variables.  If  the  provided  by  this  analysis  is  less  than 
1.0,  then  the  indication  is  that  additional  situational  variables  are  needed 

It  is  important  to  note  that  the  use  of  group  means  on  IDVs  is  recom  ¬ 
mended  only  as  a  statistical  heuristic  for  ascertaining  whether  additional 
situational  variables  are  needed  in  an  analysis.  This  is  because  the  group 
mean  scores  on  the  IDVs  are,  in  general,  “fictional  variables"  with  re¬ 
spect  to  individual  group  members,  and  thus  cannot  be  interpreted  mean¬ 
ingfully  as  "group  variables"  in  the  analysis.  To  be  sure,  within  the  con 
text  of  a  particular  theory  ,  a  group  mean  score  on  an  IDV  might  be 
interpreted  meaningfully  and  employed  and  interpreted  just  like  any  other 
situational  (group)  variable.  However,  the  group  mean  on  an  IDS'  will 
generally  lack  theoretical  import  and  thus  should  be  employed  only  as  a 
statistical  heuristic  to  ascertain  if  additional  situational  variables  might  be 
included  in  a  study 

Ar.  Illustration 

To  illustrate  the  use  of  the  above  rationale,  one  set  of  data  were 
selected  from  an  ongoing  research  study  (Hater,  Note  1).  The  data  in¬ 
cluded  (a'  subordinates'  perceptions  of  interdepartmental  conflict  (>L  »  or. 
the  part  of  124  high  level,  technical  personnel  in  an  information  systems 
department  in  a  private  health  care  foundation  (e.g..  systems  analysts 
<b .■  measures  of  work  group  centralization  of  decision  making  (5,,.  where 
the  first  subscript  connotes  situational  variable  number)  and  work  group 
formalization  of  work  roles  (5S).  Separate  measures  of5i,  and  5fc  were 
obtained  for  each  of  the  19  work  groups  in  which  the  124  subordinates 
were  employed  (work  group  supervisors  provided  the  5,  and  5^  scores 
A  one-way  ANOVA.  using  the  19  work  groups  as  the  independent  vari¬ 
able  (classification  factor)  and  the  perceptions  of  interdepartmental  con¬ 
flict  as  the  dependent  variable,  resulted  in  an  tjJ  of  .26  (p  <  .05).  This 
connotes  that  26^  of  the  variance  in  perceptions  of  interdepartmental 
conflict  was  associated  with  between-group  variations  in  the  19  work 
groups. 

The  squared  correlations  between  the  two  situational  variables  and 
perceptions  of  interdepartmental  conflict  are  presented  in  column  one  of 
Table  1  under  univariate  analysis  (i.e.,  the  ry *  column).  Following  prior 
discussion,  the  correlations  were  computed  by  assigning  each  individual 

in  group  Hi  -  1 . 19)  the  same  5„  and  5*  scores,  and  then  correlating 

the  )\j  and  5„  and  J*  scores  on  the  total  (i.e.,  across  group)  subordinate 
sample.  Before  squaring,  the  correlations  were  significant  and  positive 
The  positive  correlations  suggest  that  individuals  in  high  level  technica' 
jobs,  which  require  a  certain  degree  of  flexibility,  autonomy,  and  bound- 
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TABLE  I 

Relationships  retween  Susordinates'  Perceptions  of  Interdepartmental 
Conflict  and  Centralization  of  Decision  Making  and  Formalization 

of  Work  Roles 

Situational  variable  Relationship 


Univariate  analysis 


Centralization  of  decision  making  (5„) 

Formalization  of  work  roles  (5,,  I 

Multiple  correlation  analysis 


5.5 


r.'.  'l 

.05*  .19 

.07*  •  .2" 


R;  Ri 

,10”  M 


Note  All  analyses  based  on  individual  subordinate  sample  (.V  *  124, 
•  p  <  05 

•*  p  <  .01 


ary  spanning,  are  likely  to  perceive  a  lack  of  cooperation  and  more  con¬ 
flicts  among  organizational  departments  when  decision  making  processes 
are  constrained  by  centralized  and  formalized  structures  (cf. .  James 
Jones.  19'6i 

The  rE;  column  in  Table  1  under  univariate  analysis  indicates  the  pro¬ 
portion  of  total  variation  in  subordinates’  perceptions  of  interdepartmer. 
tal  conflict  that  was  both  (at  associated  with  between-gToup  differences 
and  (bi  accounted  for  by  either  centralization  or  formalization  (the  re 
lationships  were  linear).  For  example,  centralization  of  decision  making 
accounted  for  19*r  of  that  variance  in  interdepartmental  conflict  that 
associated  with  between-group  differences.  Consequently.  8iv?  of  the 
variance  in  the  perceptions  that  was  associated  with  between-group  dif¬ 
ferences  was  not  accounted  for  by  centralization  (i.e.,  1  -  rsf ).  It  is 
important  to  note  that  rs J  need  not  be  calculated  directly  .  One  only  needs 
to  calculate  tjj.  each  ry f.  and  then  divide  each  rj  by  r j*  (see  Eq  (6n.  In 
addition,  "accounted  for  "  is  used  only  in  the  statistical  sense,  and  doe1 
not  imply  causal  attribution  of  variance. 

The  low  er  part  of  Table  1  presents  the  results  of  the  multiple  correlation 
analysis.  Following  assignment  of  scores  to  individuals,  centralization 
and  formalization  were  correlated  .30  (A'  =  124  subjects,  p  <  .01).  which 
connotes  that  the  values  of  the  r,J’s  from  the  univariate  analysis  could  no: 
simply  be  added  to  obtain  an  estimate  of  the  variance  in  Yu  associated 
with  the  combined  situational  variables.  The  squared  multiple  correlation. 
R[.  again  computed  on  the  subordinate  sample,  was  .  10  (p  <  .01).  Division 
of  R\  by  tjJ,  which  provided  Ri,  was  .38  (i.e.,  .10/. 26),  suggesting  that  38“ 
of  the  variation  in  subordinates’  perceptions  of  interdepartmental  conflict 
that  was  associated  with  between-group  differences  w  as  accounted  for  by 
a  linear  combination  o.  centralization  and  formalization. 
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Since  the  relationships  among  the  variables  were  linear,  the  results  of 
the  analysis  above  indicates  clearly  that  additional,  between-group  pre¬ 
dictors  are  needed  in  the  study.  That  is,  based  on  1  -  /?g,  62 %  of  the 
between-group  variation  in  the  perceptions  remains  to  be  accounted  for. 
We  believe  this  is  worth  knowing!  It  should  also  be  noted  that  if  the 
differences  between  and  R[  reflect  nonlinearity,  various  forms  of 
polynominal  regression  or  moderator  analysis  would  be  indicated 

SUMMARY  AND  CONCLUSIONS 

The  primary  objectives  of  this  article  were  to  present  statistical 
rationales  for  relating  a  person  variable  to  one  or  more  situational  vari¬ 
ables.  following  assignment  of  scores  on  the  situational  variables  to  indi¬ 
viduals.  and  foi  determining  the  degree  to  which  the  obtained  relationship 
approaches  the  maximum  variation  in  a  person  variable  that  is  associated 
with  between-group  differences.  It  was  shown  that  the  correlation  be¬ 
tween  a  situational  variable  and  a  PV  was  a  function  of  between-group 
variation  on  the  PV,  in  relation  to  within-group  variation,  and  covariation 
of  the  group  means  on  the  PV'  with  the  group  scores  on  the  situational 
variable.  It  was  also  shown  that  the  squared  correlation  between  a  con¬ 
tinuously  distributed  situational  variable  and  a  PV  could  be  decomposed 
into  (a)  an  eta  square,  which  is  the  maximum  variation  in  a  PV  associated 
with  between-group  differences,  and  (b)  the  squared  correlation  between 
the  weighted  group  means  on  the  PV  and  the  situational  variable  (r s-’< 
This  decomposition  had  the  important  implication  that  r£  reflects  the 
degree  to  which  the  obtained  approaches  the  maximum  variation  in  a 
PV  associated  with  between-group  differences,  as  measured  by  r,[. 

Extensions  to  the  multivariate  case  were  presented,  and  an  application 
of  the  procedures  to  empirical  data  was  illustrated.  Finally,  as  part  of  the 
process  of  ascertaining  whether  additional  situational  variables  are 
needed  in  a  study,  it  was  recommended  that  group  means  on  individual 
difference  variables  (IDVs)  which  help  explain  between-group  variation  in 
a  PV  be  entered  into  the  analysis.  It  was  noted,  however,  that  this  proce¬ 
dure  generally  served  only  as  a  statistical  heuristic  to  determine  whether 
/?r  was  less  than  1.0  after  the  group  means  on  the  IDVs  had  been  entered 
into  the  analysis,  in  conjunction  with  the  measured  situational  variables. 
Only  in  the  case  that  a  group  mean  on  an  IDV  has  theoretical  relevance  as 
a  “group  variable"  should  the  mean  be  retained  in  the  analysis  for  inter¬ 
pretative  purposes. 

Several  additional  points  deserve  mention.  First,  a  note  of  caution 
needs  to  be  offered  concerning  the  number  of  situational  variables  (and 
group  means  on  IDVs  in  the  analysis  described  above)  employed  as  pre¬ 
dictors  in  relation  to  the  number  of  groups.  Ordinarily  there  should  be 
many  more  groups  than  situational  variables.  When  this  is  not  the  case, 
the  interpretation  of  results  must  be  guarded.  For  example,  if  there  are 
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only  two  groups,  a  single  situational  variable  whose  value  differs  for  the 
two  groups  will  serve  as  an  identifier  of  membership  in  the  groups  and  will 
account  fully  for  the  between-group  variation  of  a  PV,  irrespective  of 
whatever  conceptual  meaning  may  be  deserved  for  the  situational  vari¬ 
able.  In  general,  if  there  are  k-]  situational  variables  (where  k  is  the 
number  of  groups),  and  none  of  these  variables  can  be  perfectly  predicted 
linearly  by  one  or  more  of  the  remaining  situational  variables,  Rj  will 
always  be  equal  to  1.00.  In  such  a  case,  the  set  oi  situational  variables 
merely  serves  to  identify  the  membership  in  groups  and  will  always  yield 
/?•  =  7)%  and  thus  R$  =  1.0.  The  same  would  be  true  for  a  set  of  randomly 
generated  situational  variables  (cf.  Cohen  &  Cohen,  1975),  and  thus  it 
should  be  clear  that  as  the  number  of  situational  variables  approaches  or 
reaches  the  number  of  groups  minus  one  (A-  1),  the  closeness  of  /?•  to 
has  lesser  relevance  to  the  substantive  import  of  the  situational  variable* 
and  more  relevance  to  their  role  as  identifiers  of  group  membership.  The 
foregoing  is  of  little  concern  when  the  number  of  groups  is  very  large  in 
comparison  to  the  number  of  situational  variables,  but  in  some  studies  this 
may  not  be  the  case.1 

Second,  with  purely  correlational  data,  it  is  generally  not  meaningful  to 
attempt  to  infer  that  the  variance  attributions  (rg.  ry\.  rs\,  /?;,  /?£)  are 
causal  For  example,  James,  Hater,  Gent,  and  Bruni  (1978)  and  Roberts  e; 
ai.  (19~8>  discuss  errors  that  evolve  from  making  causa!  attributions  of 
variance  in  a  PV  to  situational  variables,  based  on  correlational  data, 
when  the  true  underlying  causal  model  involves  reciprocal  causation  be¬ 
tween  persons  and  situations. 

Finally,  with  the  exception  of  r)l.  we  have  focused  exclusively  on  con¬ 
tinuously  distributed  situational  variables,  which  reflects  our  bias  toward 
the  use  of  parametric  procedures  whenever  possible.  However,  the 
rationale  developed  is  equally  applicable  to  categorical  variables,  where, 
for  example,  a  situational  variable  is  operationalized  in  terms  of  difF-rent 
types  of  training  received.  In  this  case.  RJ  is  determined  by  the  use  of 
well-known  dummy  variable  procedures  (Cohen  &  Cohen.  1975).  or 
perhaps  a  mix  of  dummy  variables  and  continuously  distributed  variables, 
and  the  relationship  R l  =  tjJ  Rl  is  applicable. 
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Hertzoc.  Christopher,  and  Nesselroade,  John  R.  Beyond  Autoregressive  Models  Some  lm; '  • 
cations  of  the  Trait-State  Distinction  for  the  Structural  Modeling  of  Developmental  Change  Ch;. 
Development,  19ST,  56,  93-109  The  use  of  structural  modeling  techniques  to  fit  chance  con-,  i , 
including  developmental  ones,  to  repeated-measurements  data  has  been  rathei  firmly  but  uncriti¬ 
cal!}  wedded  to  autoregressive  mode!  specifications  The  uncritical  application  of  an  autoregrev-"-  t 
specification  to  repeated  measures  does  not  take  into  account  subtleties  of  conceptions  of  stab.'.;t~ 
and  change  le  g  .  the  trait-state  distinction'  that  are  now  recognized  in  the  behavioral  reseat. - 
literature  We  review  the  basic  distinction  between  trait  and  state  and  examine  the  implication-  ■ 
the  different  possibilities  for  modeling  developmental  phenomena  The  arguments  are  illustrat:  .■ 
with  empirical  example- 


One  of  the  pnmarx  arguments  favoring 
longitudinal  data  is  the  utilin  of  time- 
structured  observations  for  explaining  causal 
relations  among  variables  that  cannot  be  ex¬ 
perimentally  manipulated  isee  Biddle  6c  Mar¬ 
lin.  19ST.  in  this  issue.  Crano  4c  Mendoza, 
19S*.  in  this  issue.  Dwyer,  1953.  Heise. 
19T5  Such  is  the  case  in  studies  of  devel¬ 
opment,  in  which  an  analysis  of  pertinent 
phenomena  must  proceed  by  observing  de- 
velopment-in-context.  usually  without  the 
opportunity  to  intervene  in  the  develop¬ 
mental  process  This  state  of  affairs  helps  to 
explain  the  enthusiasm  for  structural  regres¬ 
sion  models  of  longitudinal  measurement  evi¬ 
dent  among  many  developmental  scientists 
(e  g  .  Nesselroade  &  Baltes,  1984.  Schaie  6c 
Hertzog.  1965 

A  very  common  and  popular  structural 
regression  model  for  longitudinal  data  is  re¬ 
ferred  to  as  a  first-order  autoregressive  model, 
meaning  a  model  in  which  variables  are  rep¬ 
resented  as  causes  of  themselves  over  two 
points  in  time  (Dwyer,  1983,  Joreskog  6c  Sor- 
borr.  1977.  Kessler  6c  Greenberg.  1951). 
These  models  form  the  basis  for  techniques 
such  as  cross-lagged  regression  analysis 


(Kenny  ,  1979,  Rogosa.  1979  and  have  bee: 
argued  to  be  optimal  modeling  techniquev  f  • 
studying  stabihtv  and  change  in  deveh>;  • 
mental  applicabons  leg.  Joreskog  19"-_- 
Schaie  6c  Hertzog.  1955  .  Hertzog  (1955  rt 
viewed  these  models  and  their  utility  for  Cr 
velopmental  analysis  in  some  detail 

We  have  come  to  conclude  that  the 
rationale  for  the  first-order  autoregressiv .. 
model  is  implicitly  based  on  a  trait  concep¬ 
tion  of  the  variables  in  the  model.  By  traits  we 
mean  relatively  stable  and  permanent  attrib¬ 
utes.  The  implication  of  our  conclusion  is  th.i* 
first-order  autoregressive  models  may  be  a 
poor  way  of  representing  change  for  nontrai: 
phenomena  (states),  that  is.  models  of  rel.i 
dons  among  fluctuant  attributes  depende:,; 
upon  temporary  constellations  of  influence - 
and  circumstances.  Thus,  the  thesis  of  this  ar¬ 
ticle  is  that  developmental  scienbsts  need 
differentiate  more  systematically  two  concep¬ 
tions  of  stability  and  change  as  they  bear  or 
the  modeling  of  longitudinal  data  Sub-e 
quently.  we  will  identify  prototype  classes  r‘ 
attributes  of  individuals  that  pertair.  tv  ti  • 
change/stability  distinepon 
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We  begin  our  discussion  with  the  con¬ 
cepts  of  lability  and  stability  and  the  nature  of 
the  trait-state  distinction.  Next,  we  present  re¬ 
sults  from  longitudinal  analyses  of  mood  state 
variables  that  reveal  a  covariance  structure 
among  mood  factors  that  is  incongruent  with  a 
first-order  autoregressive  model  of  change. 
Next,  the  possible  explanations  for  this  incon¬ 
gruity  from  a  state-oriented  perspective  are 
explored.  Finally,  we  discuss  alternative 
methods  for  examining  state  and  trait  change 
models  and  identify  some  of  the  critical  fea¬ 
tures  of  research  needed  to  test  the  trait-state 
distinction  and  its  implications  for  develop¬ 
mental  science 

Lability  versus  Stability  in 
Longitudinal  Data 

Longitudinal  research  maintains  a  certain 
mvstique.  especially  for  developmentalists. 
At  both  the  manifest-  and  latent-variable 
levels  longitudinal  designs  are  considered 
essential  for  testing  notions  of  stability  and 
chance  e.g  .  Balte<  &  \esselroade.  1979)  We 
acres,  but  whether  a  given  longitudinal  de- 
sum  contributes  to  the  understanding  of  sta- 
bilit,  and  chance  crucially  depends  on  the 
validity  of  its  use  (Rogova.  in  press'.  In 
ev  aluating  the  validity  of  a  particular  longitu¬ 
dinal  design,  we  must  recognize  that  stability 
or  change  is  in  fact  an  intraindividual  (within- 
person  phenomenon  In  fact,  the  face  validity 
of  a  longitudinal  design  rests  on  the  notion 
that  change  is  fundamentally  a  property  .'‘  the 
indiv  idual  unit  of  observation 

The  primary  definition  of  stability  (lack  of 
chance  in  the  literature  is  only  indirect!'  a 
function  of  intraindividual  change.  Although 
there  are  multiple  and  more  differentiated 
definitions  of  stability  (change)  (Kagan,  1980, 
Mortimer,  Finch.  &  Kumka,  1982;,  the  most 
common  ones  refer  to  unchanging  mean 
levels  over  time  (mean  stability)  and  un¬ 
changing  distributions  of  individual  differ¬ 
ences  over  time  (covariance  stability).  Note 
that  these  two  definitions  refer  to  groups  of 
individuals  as  a  whole.  Primarily  due  to  the 
research  traditions  of  a  nomothetic  (and  trait- 
onentedi  scientific  worldview,  there  has  been 
relatively  little  attention  paid  to  a  third  type  of 
stability — intraindividual  stability  (change 
within  the  given  sampling  unit).  Instead,  the 
first  tw  o  types  of  stability  are  usually  studied 
in  traditional  longitudinal  research 

Covariance  stability  ,  or  stability  of  indi¬ 
vidual  differences,  is  reflected  in  the  covari¬ 
ance  of  a  variable  with  itself  over  two  points 
ir,  timtr  In  structural  regression  models. 


covariance  stability  is  ofren  translated  into  a 
regression  of  a  variable  on  itself  m  longitudi¬ 
nal  data.  These  "autoregression"  coefficients 
may  be  termed  "stability  coefficients  ” 
Covariance  stability  reflects  the  degree  to 
which  observed  units  show  similar  change 
patterns.  Conversely,  low  levels  of  stability 
reflect,  in  Baltes  and  Nesselroade’s  (1973 1 
term,  "interindividual  differences  in  intra in¬ 
dividual  change.”  However,  the  magnitude  of 
the  stability  coefficient  depends  both  upon 
the  intraindividual  changes  and  upon  the 
magnitude  of  interindividual  differences  Sta¬ 
bility  coefficients  can  be  high  if  (1)  there  are 
high  levels  of  intraindividual  change  that  are 
consistent  across  individuals,  (2)  if  there  is  sa¬ 
lient  intraindividual  change  only  in  a  (rela¬ 
tively  small)  proportion  of  the  sampled  units, 
or  (3)  if  meaningful  amounts  of  intraindi¬ 
vidual  change  are  nevertheless  small  relative 
to  the  magnitude  of  interindividual  differ¬ 
ences.  Stability  coefficients  are.  therefore 
summary  statements  about  relative  change  in 
a  population  of  individuals.  They  are  deter¬ 
mined  by,  but  should  not  be  equated  w  iff, 
intraindividual  stability  (i.e.,  no  change’ 

Given  the  multiple  influences  on  the 
magnitude  of  stability  coefficients,  the  inter¬ 
pretation  of  longitudinally  observed  measure- 
as  stable  or  changeable  is  not  a  clear-cut  mat¬ 
ter.  A  further  complication  is  that  attribution- 
of  stability  seem  to  depend  a  great  deal  on  tht 
perspective  of  the  interpreter.  For  example,  a 
Stability  coefficient  of  *  .60  over  a  penod  of  5 
years  can  be  interpreted  as  high  or  low.  de¬ 
pending  upon  both  psychometric  eoncem- 
and  one’s  theoretical  orientation  and  expec¬ 
tations. 

Nevertheless,  longitudinal  data  are  inher¬ 
ently  more  interesting  to  the  student  of  labil¬ 
ity  and  stability  than  are  cross-sectionaJ  dat: 
because  the  former  provide  the  necessarv  but 
not  sufficient  information  for  making  sue), 
judgments.  Cross-sectionaJ  data  do  not  pro¬ 
vide  direct  evidence  of  stability  or  labihtv  at 
the  intraindividual  level.  Rather,  with  crow- 
sectional  data,  inferences  concerning  stabiliti 
or  lability  must  rest  on  the  putative  nature  or 
the  variables  that  are  measured.  For  example 
in  the  absence  of  retest  information,  one  m.i' 
be  far  more  likely  to  ascribe  stability  over 
lengthy  intervals  to  general  intelligence  th.o 
to  affective  attributes. 

In  designing  longitudinal  studies,  a  cor. 
sideration  of  the  putative  nature  of  the  var. 
ables  is,  we  believe,  crucial  to  decisions  re 
garding  subsequent  analyses  and  ultimate!- 
to  interpretive  clarity  .  It  is  in  this  light  that  w  e 
now  discuss  the  trait-state  distinction  as  a  key 
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organizing  construct  in  the  conceptualization 
of  developmental  studies  (Nesselroade,  in 
press). 


The  Trait-State  Distinction 

The  distinction  between  states  and  traits 
has  a  relatively  long  history  that  reaches  back 
at  least  as  far  as  Cicero  (Eysenck,  1983).  The 
distinction  refers  to  two  different  classes  of 
attributes  for  describing  people.  Traits,  on  the 
one  hand,  are  attributes  of  individuals  that  are 
relatively  stable  across  occasions.  For  ex¬ 
ample.  having  two  eyes,  practicing  monog¬ 
amy  ,  or  being  an  extrovert  are  tiaitlike  attri¬ 
butes.  States,  on  the  other  hand,  comprise 
attributes  of  individuals  that  are  relatively 
changeable  in  nature  Examples  of  statelike 
attributes  include  hormonal  levels,  diurnal  fa¬ 
tigue.  and  situational  anxiety.  A  dichotomy  of 
trait  and  state  may  oversimplify  the  range  of 
possibilities  (Cattell,  1966.  Nesselroade  & 
Bartsch.  1977),  but  it  suffices  for  our  purposes 
here 

Intenndividua!  differences — Research 
flowing  from  the  distinction  between  trait  and 
state,  especially  various  attempts  to  render 
the  concepts  operational,  falls  largely  within 
the  individual-differences  tradition.  Thus,  the 
working  definitions  of  the  concepts  have 
tended  to  focus  on  variation  within  and 
among  individuals.  We  will  tr>  to  draw  the 
distinction  more  sharply  in  individual- 
differences  terms 

Traits,  because  of  their  putative  stability, 
are  potentially  useful  for  the  purpose  of  dis¬ 
criminating  between  one  individual  and  an¬ 
other  without  having  to  consider  intraindi¬ 
vidual  change  (e  g.,  “Jones  is  brighter  than 
Smith").  As  attributes  that  represent  stable 
differences  among  individuals,  traits  that  are 
valid  predictors  of  other  attributes,  such  as 
how  one  will  react  in  a  particular  situation  or 
performance  at  some  task,  provide  a  basis  for 
effective,  long-range  prediction.  Moreover, 
they  are  appropriate  for  inclusion  in  explana¬ 
tory  systems  that  involve  distal  as  well  as 
proximal  causes 

States  most  commonly  represent  dimen¬ 
sions  of  intraindividual  change  and  serve  to 
discriminate  one  time  or  situation  in  the  life 
of  a  person  from  another  (e.g.,  “Wilson  was  so 
happy  yesterday  but  today  he  seems  to  be  de¬ 
pressed";  However,  states  also  can  represent 
differences  among  individuals  at  one  point  in 
time,  provided  the  individuals'  state  changes 
are  not  in  perfect  synchrony  (e.g.,  “Today,  she 
was  'up'  and  he  was  ‘down’  ”). 
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Generally  it  is  certainly  the  case  that 
most  psychological  attributes  will  neither  be, 
strictly  speaking,  traits  or  states.  That  is,  at¬ 
tributes  can  have  both  bait  and  state  compo¬ 
nents  (Nesselroade,  in  press),  in  the  case  of 
hormonal  cascades,  a  given  person  may  be  of 
a  certain  type  (e.g.,  diabetic)  or  may  have  a 
characteristic  “set-point”  in  a  homeostatic  sys 
tern.  These  aspects  of  the  attribute  would 
qualify  as  traits.  Yet  the  pattern  of  flux  would 
be  considered  the  statelike  pari  of  the  attri¬ 
bute.  One  might  even  wish  to  argue  that  in¬ 
trinsic  patterns  of  state  variability  are  them 
selves  traits.  In  this  sense,  extraversion  mig^  • 
be  a  trait,  but  variations  in  gregariousnes- 
might  be  considered  the  statelike  aspect  o' 
the  trait  Work  in  the  domain  of  anxierv  ha- 
provided  ample  evidence  of  the  utility  of 
identifying  trait  and  state  components  of  ar,\ 
ious  behavior  (e.g  ,  Cattell  fit  Scheier.  1967 
Nesselroade,  in  press;  Spielberger.  Gorsuck 
&  Lushene,  1969,. 

Misconceptions  concerning  trc it  at..' 
state — At  this  point  we  must  identify  anz 
briefly  disclaim  common  misconceptions  t)„ 
may  be  evoked  by  the  terms  trait  and  st>- 
(see  also  Nesselroade,  in  press'.  First,  our  u- 
age  of  the  term  “trait"  should  not  be  cc: 
strued  as  connoting  immutable,  genetic! 
determined  behavioral  dispositions  The  cc 
ception  of  trait  as  employ  ed  here,  indue 
Stable  behavioral  dispositions,  such  as  cic 
rette  smoking  or  chronic  stress  reactor.  t). 
can  be  modified  but  that  typically  remair  s: 
ble  over  long  penods  of  time  One  of  ti 
defining  characteristics  of  a  trait  is  inert. 
That  is,  a  trait  will  remain  the  same  uni-, 
and  until  organismic  or  environments!  r 
fiuences  act  to  change  it  Stable,  unchang  :  . 
environments  promote  stable  behavioral  di 
positions,  even  if  those  dispositions  are  p 
tentially  modifiable  by  environmental  mte- 
vention 

Second,  a  common  miseoneeptior  r- 
garding  states  is  that  they  are  some) 
ephemeral,  unpredictable,  unreliabN  me 
sured,  and  hence,  uninteresting.  Negative  . 
titudes  toward  studies  of  State  phenome: 
probably  derive  in  part  from  conceptual  o 
fusion  of  stability  and  reliability  (see  bel 
accompanied  perhaps  by  the  assumptmr  t 
fluctuant  attributes  have  little  predict-."  • 
lidity  or  explanatory  power 

Because  of  the  relative  labiliti  of  s: 
differences  among  persons,  states  are  n 
difficult  to  use  than  traits  for  prediction  p 
ticularly  in  traditional  schemes  that  base  p- 
dictions  solely  on  distributions  of  inte- 
dividual  differences.  To  predict  ftor 


C-3 


96  Child  Development 

levels  requires  the  individual’s  trait  score  and 
knowledge  of  the  form  of  the  relationship  be¬ 
tween  trait  (predictor  variable'  and  criterion. 
Predicting  from  state  levels,  however,  re¬ 
quires  some  understanding  of  environmental 
contingencies  and  future  environmental  con¬ 
ditions  In  essence,  what  one  must  do  is  capi¬ 
talize  on  traitlike  aspects  of  state  dimensions 
to  make  predictions  based  on  state  informa¬ 
tion.  For  instance,  to  know  that  someone 
tends  to  get  anxious  in  a  certain  situation  and 
to  know  how  that  person  responds  when  anx¬ 
ious  can  yield  a  prediction  concerning  what 
the  person  w  ill  do  when  placed  in  that  anxi- 
en  -eliciting  situation.  The  explanatory  power 
of  mans  behavioral  theories  might  be  greatly 
enhanced  if  they  explicitly  considered  situa¬ 
tional  characteristics  as  they  interact  with  in¬ 
dividuals  ps>  chological  states  For  example, 
Endier.  Hunt,  and  Rosenstein  (1962'  incorpo¬ 
rated  tfie-e  ideas  in  the  assessment  of  anxiety  . 
1:  tiie  cognitive  domain,  research  on  state- 
dt  pendent  learning  and  memory  phenomena 
indiCrtV'  that  mood  or  state  at  the  tune  of  er. 
codint:  information  can  influence  the  nature 
of  recall  a:  some  later  time  ie.g..  Bow  er,  1961 
Knowledge  of  a  person's  state  at  the  time  of 
learning  can.  therefore,  enhance  the  predict¬ 
ability  of  his  or  her  recall  performance 

Discriminating  trait  and  state  — The 
troit-statt  distinction  underscores  the  idea 
the-  the  differences  existing  among  individ¬ 
ual-  at  one  point  in  rime  mav  well  be  a  func¬ 
tion  of  both  stable  and  labile  attributes 
Therefore,  in  using  covariation  techniques 
suc  h  as  structural  equation  modeling  that  cap- 
ilabzt  on  individual  differences  in  data,  one 
must  be  alert  to  the  fact  that  the  variation  that 
is  being  anal)  zed  potential^  reflects  latent 
variables  of  differing  temporal  characteristics 
such  as  states  and  traits 

Longitudinal  Characteristics 
of  State  Measures 

At  this  point  two  empmcal  examples  of 
longitudinally  assessed  characteristics  of  state 
measures  will  be  bnefly  presented  to  resolve 
three  common  misconceptions  about  states 
that  anse  because  of  their  intraindividual  la¬ 
bility  (1 )  their  measurement  structure  will  be 
unstable.  (21  they  will  show-  low  internal  con¬ 
sistency.  and  .3  they  will  not  correlate  with 
each  other  in  a  consistent  manner  We  believe 
the  rectification  of  these  misconceptions  is 
highly  germane  to  the  utilization  of  longitudi¬ 
nal  data  on  psy  chological  states 

Older  adults'  data  — The  first  set  of  data 
pertinent  to  these  issues  consisted  of  self- 
report  measures  on  five  state  dimensions 


(anxiety,  stress,  depression,  regression,  and 
fatigue)  obtained  on  1 1 1  older  adults  at  two 
occasions  of  measurement  Approximately  1 
month  elapsed  between  the  two  measure¬ 
ment  occasions.  These  data  were  reported  by 
Nesselroade,  Mitteness,  and  Thompson 
(1964),  who  found  that  the  anxiety  and  fatigue 
indicators  formed  well-defined,  positively  m- 
tercorrelated  latent  variables. 

We  reanalyzed  the  Nesselroade  et  al 
(1964)  anxiety  and  fatigue  data.  The  indicators 
used  were:  (1)  the  Anxiety  scale  of  the  8-Stau 
Questionnaire  (8SQ),  Form  A  (Curran  &  Cat- 
tell,  1976),  (2)  the  Anxiety  scale  of  the  8SQ 
Form  B,  (3/  the  Spielberger  State  Anxietv 
Scale  (Spielberger  et  al.,  1969;,  and  (4'  three 
four-item  packets  taken  from  the  Fatigue 
scale  of  the  8SQ,  Form  A.  Nesselroade  et  a! 
(1964  showed  that  the  three  Anxiety  scale- 
and  the  three  Fatigue  subscales  formed  later/ 
variables  of  Anxiety  and  Fatigue,  respec¬ 
tively.  They  also  showed  that  these  later/, 
variables  exhibited  invariant  facto:  loading- 
across  the  two  longitudinal  occasions 

The  8SQ  Anxiety  scales.  Forms  A  anc  B 
were  designed  to  be  parallel  forms,  having 
equal  true-score  variances  and  equal  mea¬ 
surement-error  variances  (see  Lord  dc  Nov  it) 
1966  The  psychometric  assumptions  cf  pa- 
allelism  can  be  translated  into  a  set  of  testabi- 
hypotheses  regarding  the  covanance  structure 
of  the  measures  (Joreskog  1971  1974  Wert- 
Breland.  Grandy .  &  Rock.  1960  The  first  goal 
of  the  reanalysis  was  to  show  that  measure¬ 
ment  of  labile  states  does  not  lmpK  labiu 
measurement  properties  That  is.  individua 
differences  in  state  variables  mav  properly  be 
quite  unstable  Such  instability,  however 
does  not  imply  that  the  measures  are  unreli¬ 
able  or  invalid  as  measures  of  the  psychologi¬ 
cal  states  Instead,  one  can  support  the  rel . - 
ability  and  validity  of  the  state  instruments  by 
showing  that  they  have  appropriate  measure¬ 
ment  properties  while  being  sensitive  to  la¬ 
bility  in  individua]  differences  in  the  underlv  - 
ing  dimensions.  The  parallel  forms  for  8SQ 
Anxiety  allowed  us  to  test  the  following  hy- 
potheses:  (1)  the  8SQ  measures  have  equa' 
factor  loadings  and  equal  true-score  variance1 
within  each  longitudinal  occasion,  (2  tf. 
8SQ  forms  have  equal  error  variances  with.; 
each  longitudinal  occasion,  (3)  the  factor  load 
ings  and  error  variances  for  the  alternate 
forms  are  equal  across  longitudinal  occasion- . 
and  (4'  the  Spielberger  State  Anxiety  Scale  i- 
congenenc,  but  not  tau-equivalent.  with  th 
8SQ  Anxiety  forms  (see  Lord  6r  Novick.  19fri 
for  a  discussion  of  these  different  assumg 
bons'  The  first  two  hypotheses  relate  to  the- 
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parallelism  of  the  Anxiety  measures  at  each  reliable  but  specific  component  of  variance 
longitudinal  occasion,  whereas  the  third  hy-  present  in  the  Spielberger  test  that  oovaries 
pothesis  stipulates  that  the  measurement  with  itself  over  time.  However,  the  other  hy  - 
properties  are  invariant  across  the  longitudi-  potheses  were  strongly  supported  by  the  data 
nal  occasions  The  fourth  hypothesis  implies  (see  Table  1). 
that  the  Spielberger  Scale,  with  the  8SQ  Anx- 

iety  scale,  will  form  a  latent  variable  that  ac-  Y'V  rean“ysjs  confirmed  that  Forms  A 

counts  for  all  its  reliable  variance.  Of  these  ®  have  equal  factor  loadings  and  equal 

measurement-property  hypotheses,  Nessel-  ff101  yar'ances  •n^  are  therefore  parallel 
roade  et  aJ.  (1984)  tested  only  for  invariant  forms  Uoreskog,  1971),  and  that  these  mea- 
factor  loadings  across  longitudinal  occasions.  *urement  properties  (including  the  variance'. 

of  the  factors)  did  not  change  upon  the  second 
These  hypotheses  can  be  understood  by  administration.  We  also  found  the  reliabilities 
reference  to  Figure  1,  which  shows  the  basic  of  t)ie  alternate  forms  for  Anxietv  to  be  un¬ 
model  for  the  two  latent  variables  originally  changing  over  time.  The  estimated  reliability 

tested  by  Nesselroade  et  al.  (1984).  In  a  pre-  coefficients  of  Forms  A  and  B  were  .88  in  this 

linunarv  analysis,  we  discovered  that  the  Ion-  older  population.  Table  2  gives  the  parameter 

gitudina)  model  could  be  fit  best  by  allowing  estimates  from  the  final  model.  The  upper 

autocovariance  between  the  residuals  for  the  half  of  the  fector-covariance  matrix  presents 

Spielberger  tests  and  Fatigue  subscales  B  and  the  correlations  among  the  latent  factors  The 

C  That  is,  we  modeled  a  residual  covariance  iatent  factors  have  moderate  and  stationary 

for  the  Spielberger  test  between  Time  1  and  correlations.  The  estimated  autocorrelation^ 
Time  2.  a  residua!  covariance  for  Fatigue  B  for  Anxiety  and  Fatigue  were  .63.  and  .71. 
between  Time  1  and  Time  2,  and  a  residual  respectiveh 
covanance  for  Fatigue  C  betw  een  Time  1  and 

Time  2  These  residual  covariances  are  de-  These  autocorrelations,  which  reflect  the 

picted  in  Figure  1  The  presence  of  the  resid-  stability  of  individual  differences,  reach  a 
ua!  covanance  for  the  Spielberger  Scale  maximum  of  1.0  when  individual  different 

forces  us  to  abandon  Hypothesis  4:  the  Spiel-  are  perfectly  preserved  over  time  (Baltt- 

berger  Anxiety  Scale  is  not  a  congenenc  mea-  Reese,  &  Nesselroade,  1977.  Blalock  1ST 

sure  of  the  latent  Anxiety  variable  in  the  pres-  Wheaton.  Muthen.  Alwin.  &  Summers,  1977 

ence  of  the  SSQ  Forms  A  and  B.  There  is  a  The  autocorrelations  are  substantia!  for  the 


FlC  1  — Representation  of  Anxiety  and  Fatigue  interrelationships  and  stability  as  modeled  by  Ne'st ' 
roadt  et  aJ  (ly6-4 


TABLE  1 


Coodsess-of-Fit  Indices  fof  Older  adults'  Mood  State  Models 

Model 

Xs 

d/ 

P 

CF1* 

ACFI1 

Ldf' 

P 

01.  Basic  mode!  (see  Fig  II  .  .. 

57.19 

49 

.20 

.923 

.88 

02  Tau-equivalence  for  Forms 

A  and  B  ■  A ,  *  X2' . 

57.49 

50 

32. 

.923 

.88 

.30 

1 

N  S 

03  Parallelism  for  8-state  (over 
time  and  within  time'  . 

58.18 

53 

39 

.923 

.89 

.69 

3 

VS 

0-1  Parallelism  and  stationary 
latent  variances  (6, ,  “  4>M. 

4--  =  644  . 

58.50 

55 

.35 

.922 

.89 

.32 

2 

V  s 

05  Add  stationary  covariance 
iwithin  occasion’  (♦-.  = 

C..  . 

5S  73 

56 

.38 

.922 

.89 

.23 

1 

N  8 

*  L!>fcL-  goodness -of*  fit  index 

*  Lis  PE*,  adjusted  goodness-of-fit  index 
v  Chanc*  in  x*  from  preceding  mode) 

c  Cna'-.^  ;r«  df  irutr.  preceding  mode'. 


TABLE  2 

LlbnEL  L-’  v  vtes  fof.  F:s\l  Model  on  Older  ad:  ltV  Mood  Stste* 

F 4kCTOF  Pattern  Weights  and  I'mq:  esesses 
Anxier.  \  Fatigue  1  Anuef\  2  Fatigue  2  6 


sf:e;_; 

1  0* 

0 

0 

0 

55  b i  *. V 

a\ya; 

2  36V  .22 

0 

0 

0 

25  23  2  3r 

AN.XB . 

....  2.36-  .22 

0 

0 

0 

25  23  .2  3- 

Fata; 

0 

1.0* 

0 

0 

1.22  .2- 

fate.  . 

0 

.81. 

0"  0 

C 

4.30  .63 

F  AT  c ;  ....... 

0 

.84 

05  0 

0 

1  32  .25 

sfiel:  .  .... 

0 

0 

1.0’ 

0 

46  85  (6  5r 

AVYA2  . 

0 

0 

2.36V. 22- 

0 

25.23"i2  3“ 

anxe: 

.  .  0 

0 

2.36V  .22 

0 

25.23  .2  3- 

fatai  . 

0 

0 

0 

1.0" 

.55  i  .25 

fatf:  .  . 

0 

0 

0 

.81  0* 

261  4'' 

F.ATC  1  . 

0 

0 

0 

.84  05 

2  41)  .3* 

Factor  Covariance  Matrix 

Anxiety  1 

Fatigue  1 

Anxiety  2 

FariC"*  2 

Anxierv  . 

.  .  33  46Jt7.20 

.69 

.63 

.4  b 

Fatigue  1  .....  . 

9.32V  1.63 

5.5  l'i  .73) 

.44 

•VT 

Anxiets  2  ... 

.  ...  21.03(5.53 

5.91  (1.62 

33. 46'1, 7.20 

.69 

Fatigue  2  . 

6  51  ;  1.63 

3.98  i  .70 

9.32M  63 

551’  *5 

NOTE  — *  denotes  fixed  parameter  Abbreviations  SPIELI,  SPIEL2  •  Spiellxerger  Annetx  Scale  Tinin  ,  *•  z  2 
AV.VM.  ANXB1  ANXA2.  ANXB2  •  feLSutr  Anxiet>  Scales.  Forms  A  and  B.  at  Times  1  and  2.  FATAL  FaTE. 
FATCi  FAT  \2  F  ATB2  FaTB3  *  Fatigue  item  packets  A.  B.  and  C  at  Times  1  and  2 
'  Constrained  equal  regression  of  S-state  variables  on  Anxierv 
*'  Constrained  equal  Estate  measurement  error  variances 
Value>  above  diagonal  are  factor  correiatioro 
d  Cunstrained  equal  Anxien  (actor  variances 

'  Constrained  equal  covariance*  of  Anxierv  and  Fatigue  at  Times  1  and  2 
1  Constrained  equa*  Fatigue  (actor  variances 
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state  measures  and  may  indicate  less  lability 
in  mood  states  in  older  populations.  Never¬ 
theless,  the  autocorrelations  do  not  approach 
the  maximum  of  1.0  even  though  a  period  of 
only  1  month  separates  die  measurement  oc¬ 
casions  This  level  of  stability'  can  be  con¬ 
trasted  with  data  on  psychometric  intelli¬ 
gence  in  older  populations  reported  by 
Hertzog  and  Schaie  (1986),  in  which  the  auto¬ 
correlation  of  a  genera]  intelligence  factor  ex¬ 
ceeded  .9  over  7-year  intervals! 

The  current  analysis  shows  that  the  mod¬ 
erate  levels  of  stability  in  individual  differ¬ 
ences  is  not  a  function  of  lack  of  reliability  in 
the  state  measures  (see  also  Nesselroade  et 
al  .  19b-).  Nesselroade  Pruchno,  &  Jacobs, 
1995  Instead,  it  is  attributable  to  lability  of 
indiv  idual  differences  in  latent  states. 

Youngrr  adults'  data  — In  a  second  set  of 
data,  both  Forms  A  and  B  of  the  Curran  and 
Cattell  8SQ  Anxiety  Scale  were  administered 
to  42  college  students  at  each  of  four  occa¬ 
sions  of  measurement  (Nesselroade  et  al., 
1985  Approximately  4  days  elapsed  be¬ 
tween  successive  measurement  occasions.  .Al¬ 
though  the  sample  size  here  is  small  for  pur¬ 
poses  of  confirmatory  factor  analysis  (see 
Tanaka.  199“.  in  this  issue:,  the  data  set  is 
didactically  useful 

Our  reanaly  sis  focused  again  on  the  mea¬ 
surement  properties  of  Forms  A  and  B.  Fig¬ 
ure  2  shows  the  basic  model  originally  esti¬ 
mated  by  Nesselroade  et  al  (1985;.  We  used 
the  same  model  on  the  covariance  matrix  of 
the  alternate  forms  and  tested  the  hypotheses 
of  parallelism  and  stable  measurement  prop¬ 
erties  over  time.  Table  3  summarizes  a  set  of 
models  testing  parallelism  in  Forms  A  and  B 
within  occasions  and  over  time.  It  appears 
that  Forms  A  and  B  have  unchanging  mea¬ 
surement  properties  over  time,  but  that  there 
is  some  indication  that  they  are  not  perfectly 
parallel  forms  in  this  younger  population.  The 
reliability  of  the  scales  is  high.  Based  on  the 
results  from  Model  Y3,  we  estimate  the  reli¬ 
ability  of  Form  A  at  1.0  and  the  reliability  of 
Form  B  at  .87.  From  Model  Y5  (complete  par¬ 
allelism),  the  reliability  of  both  forms  is  esti¬ 
mated  at  .94. 

Nesselroade  et  al.  (1985'  found  that  the 
correlations  among  the  Anxiety  factors  across 
the  longitudinal  occasions  were  quite  low. 
We  examined  these  correlations  by  testing 
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Fic.  2 — Representation  of  Anxieti  scab  it 
liabilities  and  latent  variable  stabilities  as  modeit-; 
by  Nesselroade  et  al  (1985/. 


the  hypothesis  of  orthogonal  factors — that  i- 
by  requiring  all  factor  covariances  to  be  five  .- 
to  0.  The  fit  of  th  is  model,  presented  in  th- 
last  row  of  Table  3,  was  not  significant]-, 
worse  than  that  of  the  preceding  model  o: 
complete  parallelism  in  error  variances 

The  young  adults'  data  provide  an  eve; 
stronger  demonstration  of  the  differentiat. 
of  stability  and  reliability  in  state  variable- 
such  as  anxiety  The  factor  correlations  of  tib¬ 
ia  tent  Anxiety  variable  over  time  are  so  lov 
that  we  cannot  reject  the  hypothesis  that  ti.- 
factors  are  uncon-elated  in  the  young  adult 
population,1  and  yet  the  reliabilities  of  the 
anxiety  measures  are  high  Finally  ,  in  spite  rf 
these  low  covariances,  there  is  still  stations:  - 
ity  in  the  variances  of  Anxiety  over  the  fou- 
occasions  of  measurement. 

Summary. — The  published  literature 
and  the  reanalysis  reported  here  present  a 
coherent  picture:  factor- analytic  work  ha- 
demonstrated  the  existence  of  state  dimen¬ 
sions  that  can  be  reliably  measured  Thesi 
state  dimensions  behave  well  when  analyzed 
with  confirmatory  factor  models  enabling  as 
sessment  of  their  psychometric  properties 
Taken  together,  the  results  of  the  analyses  jus* 
reported  suggest  that  the  state  measures  hav  t 
stable  measurement  properties  over  time.  st. 
tionary  cov.riance  ructures.  and  cor.-idt  - 
ably  less  than  peritet  stability  of  indiviCu. 
differences.  Thus,  in  state  measures,  low  st. 
bility  is  not  a  sign  of  poor  measurement  pro; 


1  We  are  not  suggesting  the  factors  are  orthogonal  in  student  populations  The  statistical  po\»e: 
of  this  test  is  not  high,  given  the  relatively  imall  sample  size  The  important  point  is  that,  even  if  il,- 
population  correlations  are  not  exactly  zero,  they  are  indeed  small  relative  to  the  reliabilities  c  : 
Form.-  A  and  P 
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TABLE  3 


Goodness  of  Fit  fob  Alternative  Models  of  Yol  nc  adults’  Avxiett 


Model 

df 

P 

CF1* 

ACF11 

A*2 

Ldf 

P 

Yl:  Basic  model  (X  for  Form  B 
equal  over  timet  ... 

\d  1  1 

17 

.67 

.93 

.84 

Y2:  Tau-equivalence  (all  x  « 

1.0.  . . 

16.06 

16 

.59 

.92 

.84 

1.95 

1 

N.S 

Y3.  Equal  true  scores  (all 
diagonal  4>  over  time) . 

17.08 

21 

.71 

.91 

.85 

1.02 

3 

NS 

Y4  Within-occasion  parallelism 

27  07 

25 

.35 

.87 

.61 

9  99 

4 

<.05 

Y5  Complete  Parallelism  .  .  .  . 

32  5u 

26 

.26 

.85 

.61 

5.43 

3 

>10 

Y6  Parallelism  and  ortnogona. 
factors  . 

39  5* 

34 

.24 

.83 

.82 

7.06 

6 

NS 

*  l_l *-  goodness-of-fi?  index 
*"  LlihtL  ^d.usted  goodness-cf-fi’  indt-' 
‘  Chance  ir  x*  from  precedinc  modt’ 
a  Cnar.ge  ir.  df  irom  preceding  rr.ude! 


erties  of  the  measures  but  rather  an  indication 
o:  a  high  degre-  of  lability  of  individuals  or. 
tn>-  underlying  state  dimensions  Considera¬ 
tion  ot  psychological  states  requires  scientists 
to  select  carefulh  research  designs  and  tech¬ 
niques  appropriate  for  assessing  states  and 
state  measures.  For  exarr.ph  test-retest 
coefficients  are  invalid  reliability  estimators 
for  state  measures,  given  lability  of  the  states 
themselves  The  implications  of  stationarirv 
in  the  cov  ariance  structure  of  states  are  impor¬ 
tant  with;  respect  to  the  analysis  of  lonptud.- 
nal  data,  as  we  shaJl  ne  w  discuss 

Characteristics  of  Autoregressive 
Structural  Equation  Models 

In  this  section  we  examine  closely  the 
underlying  assumptions  of  autoregressive 
models  and  demonstrate  that  a  basic  first- 
order  autoregressive  mode!  inadequately  ac¬ 
counts  for  the  fact  that,  in  the  Nesselroade  et 
al.  (198-}.  state  data,  the  anxiety  and  fatigue 
factors  maintain  a  moderately  strong  covari¬ 
ance  with  each  other  at  the  two  time  points. 

Figure  3  shows  a  simple  autoregressive 
model  for  two  latent  variables  'cross  three 
times  of  measurement  (for  the  sake  of  simplic¬ 
ity,  the  measurement  model  is  not  depicted- 
The  basic  feature  of  the  model  is  that  each 
variable  caises  itself  at  the  immediately  fol¬ 
lowing  occasion  of  measurement  These  autc- 
regTessions  are  the  coefficients  Pi,  (T.  P3.  and 
(L  depicted  by  solid  lines  The  latent  variable 
at  any  occasion  of  measurement,  call  it  t,  is  a 
function  of  itself  at  the  preceding  measure¬ 
ment  occasion 

T|  if-  =  f  [  v,  t  -  \  ' 


This  mode!  is  a  first-order  autoregressive 
mode!  because  only  relationships  of  lac  1 
if  -  1  to  f  are  structured  ir.  the  mode!  Tra 
autoregressive  model  depicted  in  Figure  3 
contains  two  possibilities  for  causal  m- 
fiuencev  of  latent  variables  on  other  latent 
variables  cross-lagced  regression  or  simul¬ 
taneous  regressions.  In  Figure  3.  the  dashid 
lines  represent  these  simultaneous  lor  recip¬ 
rocal  influences.  It  should  be  emphasized 
that  the  model  show  n  in  Figure  3  is  illustra¬ 
tive  only,  not  all  cross-lagged  and  simultane¬ 
ous  regressions  shown  can  be  identified  and 
estimated  Whether  one  should  model  lagged 
regressions,  simultaneous  regressions,  o- 
some  combination  of  the  two  is  a  matter  of 
theorv  relating  the  timing  of  causal  relations 
to  the  time  interval  in  the  panel  design  isee 
kessler  &  Greenberg.  1981 

Let  us  assume  for  the  moment  that  the 
model  shown  in  Figure  3.  including  only  the 
autoregressions  (solid  lines  ,  is  the  true 


Fic  3 — Bas  ic  structure  recession  modt! 
with  autoregression  coefficient*  (solid  lint-v  and 
CTOiW^cr^^irm  oo^ffic  ifn*x  '  dashed  fine* 
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model  The  model  posits  an  initial  covariance 
between  die  two  different  latent  variables 
(621).  but  these  variables  do  not  cause  change 
in  each  other  over  time.  Hertzog  (1986)  re¬ 
ferred  to  this  model  as  the  isolated  stability 
model,  because  there  is  nonzero  stability  in 
each  latent  variable  (modeled  through  the 
autoregressive  coefficients),  but  this  stability 
is  an  isolated  autoregression  not  buttressed  by 
cross-lagged  regressions  between  the  two  la¬ 
tent  variables  This  model  has  been  discussed 
by  Dwyer  (1983)  and  Rogosa  (1979)  as  an  im¬ 
portant  null  hypothesis  mode!  to  be  rejected 
before  alternative  cross-lagged  or  simultane¬ 
ous  causal  relationships  between  the  vari¬ 
ables  can  be  taken  seriously. 

The  isolated  stability  model  is  an  en- 
tropic  model  in  the  sense  that,  in  the  absence 
of  cross-lageed  regressions,  the  covariance  be¬ 
ns  een  the  two  variables  will  steadily  decrease 
ose:  time  unless  there  is  perfect  stability  of 
indnidual  differences  overtime  For  simplic¬ 
ity  of  exposition,  we  will  deal  for  the  moment 
with  the  correlations  among  the  latent  vari¬ 
ables  *  Assuming  no  omitted  causes  of  the 
two  latent  variables,  the  population  correla¬ 
tion  between  the  two  latent  variables  at  Time 


P  =  3.  3; 

At  Time  3,  the  correlation  is 

p  =  6;:  P.  6;  P3  (3., 

In  genera!,  the  isolated  stability  mode!  pre¬ 
dicts  decreases  over  time  in  the  withtn-time 
correlations  among  the  two  latent  variables 
unless  each  (standardized'  P  is  a  1;  with  an 
infinite  number  of  occasions,  the  correlation 
decays  to  the  entropic  minimum  of  0. 

What  then  can  account  for  the  fact  that  in 
some  cases  within-time  correlations  between 
latent  variables  stay  the  same  (as  in  the  Nes- 
selroade  et  al  data'  or  even  increase5  Mathe¬ 
matically,  we  have  seen  that  variables  in  the 
system  will  become  increasingly  less  cor¬ 
related  unless  either  (1)  the  correlations  of  the 
variables  with  themselves  are  1.0  over  time, 
or  (21  through  mutual  causation  (or  causation 
by  variables  external  to  the  system),  the  corre¬ 
lation  among  the  variables  is  "built  up,”  so  to 
speak  The  first  case  is  one  of  perfect  stabil¬ 
ity — nothing  is  changing,  at  least  at  the  level 
of  tndix  idual  differences  about  the  latent  vari¬ 
able  means  And  if  nothing  changes,  then  cor¬ 
relations  among  different  latent  variables  are 
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preserved.  But  if  stability  of  individual  differ¬ 
ences  is  less  than  perfect — that  is,  if  there  are 
in  fact  individual  differences  in  change  over 
time— then  the  correlations  among  the  latent 
variables  will  decrease  unless  such  changes 
in  the  two  latent  variables  are  themselves  cor¬ 
related  due  to  mutual  causal  influence  (as  in¬ 
stantiated  in  cross-lagged  regressions'  or 
mutually  shared  causes  (other  than  the  tw  o 
variables  themselves). 

This  argument  is  quite  complicated,  so 
let  us  summarize:  the  implicit  assumption  in 
the  autoregressive  model  seems  to  be  as  fol¬ 
lows  :  individual  differences  will  remain  per¬ 
fectly  stable,  and  hence  perfectly  predictablr 
through  autoregression,  unless  extern,-.' 
causes  act  to  change  the  variables  measured 
in  the  system.  Dwyer  (1983)  has  characterized 
this  assumption  as  one  of  temporal  inertia 
The  implicit  corollary  of  this  assumption  is 
that,  if  stability  is  imperfect  there  has  beer, 
change  in  individual  differences  that  can  be 
modeled  as  a  function  of  the  causes  of  chanc-. 
This  is  the  apparent  rationale  for  using  the 
regression  of  the  latent  variable  on  its  causes 
partialed  for  autoregression,  as  a  measure  of 
the  magnitude  of  causal  influence  (see  Kt-- 
sler  &  Greenberg.  1981.’. 

Given  our  earlier  discussion,  this  as¬ 
sumption  clearly  resonates  with  a  trait  con¬ 
ception  of  constructs  and  change  in  con¬ 
structs.  Inertia,  or  stability  of  individu..’ 
differences,  is  expected  unless  other  variable ■■ 
act  to  change  the  underlying  attributes  being 
measured.  This  is  the  basis  of  our  concert 
with  the  standard  autoregressive  mode!  fer 
portraying  longitudinally  measured  variable! 
Under  the  trait  conception,  it  makes  sense  tc 
assume  perfect  stability  of  individual  differ¬ 
ences  unless  the  system  is  perturbed  by 
causa]  influences.  This  assumption  appears  to 
make  sense  for  certain  psychological  phenom¬ 
ena  that  is,  those  suspected  to  be  endunng 
such  as  stable  attributes  of  individuals  that 
have  reached  a  determined  end  state  (i.e..  a 
stable  individual-differences  distribution 
The  assumption  of  inertia!  stability  of  individ¬ 
ual  differences  modeled  via  autoregressu  t 
coefficients  makes  little  sense  for  fluctuant  at¬ 
tributes  such  as  psychological  states. 

Recent  developments  in  the  methodolo, 
ical  literature  have  demonstrated  that  autort  - 
gressive  structural  equation  models  should  not 
be  routinely  considered  the  method  of  choitt 
for  analyzing  change  (e  g  ,  Rogosa,  in  pres1- 
Rogosa  &  Willett  1985).  Rather,  use  of  a: 


1  The  entropic  nature  of  the  model  holds  for  covariances  as  well  (see  Dwyer.  1983 
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autoregressive  model  must  be  dictated  by  the 
nature  of  the  research  question  and  the  char¬ 
acteristics  of  the  psychological  phenomena 
under  study.  Rogosa  and  Willett  (1985)  have 
criticized  the  rationale  of  the  autoregression 
model  rather  severely  on  the  grounds  that  the 
partialed,  cross-lagged  regression  coefficient, 
removing  autoregression,  is  a  poor  represen¬ 
tation  of  change  and  the  causal  variables'  in¬ 
fluence  on  change.  In  fact,  they  argue  that  it  is 
often  “too  easy”  to  fit  autoregressive  models 
to  longitudinal  data. 

In  the  nex  ction  we  will  empirically 
examine  the  usefulness  of  the  autoregressive 
model  with  regard  to  the  adulthood  data  of 
Nesselroade  et  al.  (1984;.  Given  the  stationary 
covariance  structure  that  vve  identified  for  the 
older  adults'  mood  svates,  one  might  be  opti¬ 
mistic  that  an  autoregressive  model  with 
cross-lag  relations  will  fit  the  data  As  we 
shall  see.  this  is  not  the  case 

Fitting  Autoregressive  Models  to  the 
Mood  State  Data 

Our  assessment  of  the  autoregressive 
model's  effectiveness  in  modeling  mood 
states  is  based  solely  on  the  older  adults'  data 
reported  b>  Nesselroade  et  al.  (1984  Given 
that  we  found  no  substantia]  covanance 
among  the  Anxiety  factors  for  the  under¬ 
graduate  sample  studied  by  Nesselroade  et  al. 
(1985  .  one  could  say  that  the  autoregressive 
mode!  is  trivially  satisfied  by  modeling  no  as¬ 
sociation  in  the  covariance  structure  with 
autoregression  coefficients  of  O' 

We  fit  a  senes  of  autoregressive  models 
similar  to  that  shown  in  Figure  3  (for  two  oc¬ 
casions  of  measurement  onl>'  to  the  older 
adults'  data  from  Nesselroade  et  al.  (1984!. 


The  measurement  model  for  all  the  autore¬ 
gression  models  used  was  03  (see  Table  1) 
specifying  parallelism  for  the  eight-state  mea¬ 
sures  and  correlated  measurement  residuals 
for  the  Spielberger  and  Fatigue  scales  This 
measurement  model  may  be  considered  a 
basis  model  for  evaluating  the  fit  of  our  auto¬ 
regressive  structural  models.  The  best  fit  an 
autoregressive  model  could  achieve  is  the  fit 
of  03,  which  placed  no  constraints  what¬ 
soever  on  the  latent  covariance  matrix,  that  is, 
all  latent  factors  were  allowed  to  covary.  In 
Bender  and  Bonett’s  (1980)  terms,  Model  03 
is  equivalent  to  a  saturated  model  (one  just 
identified  in  its  structural  regression  equa¬ 
tions).  Therefore,  we  can  assess  the  adequacy 
of  our  autoregressive  models  by  testing  thei: 
difference  in  fit  from  the  basis  measurement 
model  03  (see  Hertzog.  1986  . 

Our  first  regression  model  specified  was 
an  isolated  stability  model  containing  auto¬ 
regressions  but  no  cross-lagged  or  simultane¬ 
ous  regression  of  Anxiety  and  Fatigue  on 
each  other.  Table  4  gives  the  goodness  of  fit  of 
this  model  (Model  All.  It  did  not  fit  well, 
especially  relative  to  the  original  measure¬ 
ment  model  (Model  03..  This  lack  of  fit  is  to 
be  expected — and  even  desired — for  it  indi¬ 
cates  a  lack  cf  fit  to  the  latent  cov  ariances  of  a 
null  hypothesis  model  of  isolated  stability 
According  to  the  logic  of  cross-lagged  regres¬ 
sion  analysis,  rejection  of  Model  Al  opens  the 
possibility  that  cross-lagged  regressions  in¬ 
volving  Anxiety  and  Fatigue  are  required  to 
fit  the  data 

The  next  model.  A2,  fitted  cross-lagged 
regressions  as  well.  It  did  not  improve  on  the 
poor  fit  of  the  isolated  stability  mode!  (see 
Table  4)1  Moreover,  the  cross-lagged  regres¬ 
sions  were  not  statistically  significant. 


table  4 


Goodsess  of  Fit  of  Autoregressive  Models  for  Older  Adults’  Mood  States 


Model 

x! 

df 

P 

CFI' 

ACF1' 

Ax:' 

Ldf 

P 

Measurement  model  (03  from 

Table  1 

58.18 

53 

.2y 

.923 

.89 

Al  Isolated  stabilirv  . 

ior.04 

56 

.00 

.872 

.82 

48  86 

3 

<.001 

-A2  Cross-lagged  regression  . 

106.2T 

54 

.00 

.872 

.82 

46. oy 

1 

<.oo: 

Aj  Simultaneous  repressions  at 

Time  2 

68  71 

54 

00 

.91 0 

.S' 

10  55 

1 

<  t  ] 

A4  Just-identified  cross-lag  with 

correlated  residual.  Time  2 . 

58  16 

53 

2  9 

.923 

.89 

A5  Isolated  stabilitv  with  cotre- 

lated  residual.  Time  2  . 

58.22 

55 

.36 

.923 

.89 

.04 

2 

N  S 

*  LisRLL  goudness-offit  indev 
''  Lis  At  L  »djustrd  goodness-of-fit  indev 
'  Change  in  from  measurement  model  (03 
d  Ch*nsr  in  df  from  measurement  model 
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It  is  possible  that  the  time  lag  is  too  long 
in  these  data  to  detect  the  true  influences  of 
Anxiety  and  Fatigue  on  each  other  using 
cross-lagged  regressions.  If  so,  then  an  obvi¬ 
ous  alternative  is  to  specify  simultaneous,  re¬ 
ciprocal  influences  of  Anxiety  2  and  Fatigue  2 
(see  Dwyer,  1983;  Heise,  1975).  An  alterna¬ 
tive  model,  A3,  specifying  only  autoregres¬ 
sions  and  the  reciprocal  causal  influences  of 
Anxiety  and  Fatigue  on  each  other  at  Time  2, 
fared  much  better  than  Model  A2  but  still  did 
not  achieve  the  same  level  of  fit  as  the  mea¬ 
surement  model.  Each  of  these  models  (A2 
and  A3'  fit  poorly  in  spite  of  the  fact  that  they 
have  but  1  df  in  the  structural  equations:  both 
estimate  nine  parameters  (the  two  latent  vari¬ 
ance'-  and  the  latent  covariance  at  Time  1, 
four  regression  coefficients,  and  two  residual 
variances  at  Time  2'. 

Given  that  we  were  limited  in  these  data 
to  tw  o  occasions  of  measurement  it  was  pos¬ 
sible  to  improve  the  fit  of  the  first-order  auto- 
reg-essi\e  model  by  adding  a  residual 
covariance  between  Anxiety  and  Fatigue  at 
Time  a  This  model,  A4,  fit  exactly  as  well  as 
the  original  measurement  model  (see  Table 
4  This  equivalent  fit  was  no  accident  how- 
ev  er  as  it  w  as  statistically  determined  by  the 
fact  that  mode!  A4  is  just  identified  in  the 
structural  mode!  The  model  created  10 
unique  latent  variances  and  covariances  and 
estimated  in  turn  10  structural  regression  pa¬ 
rameters — the  latent  variances  and  covari¬ 
ance  at  Time  1.  four  regression  coefficients, 
two  residual  variances  at  Time  2,  and  the  re¬ 
sidual  covariance  at  Time  2.  In  other  words, 
the  autoregression  mode!  was  salvaged,  but 
onb  by  removing  all  restrictions  on  the  latent 
covariance  structure.  However,  both  of  the 
cross-lagged  coefficients  were  estimated  to  be 
equal  to  01  Specifically ,  the  regression  of  Anx¬ 
iety  2  on  Fatigue  1  was  estimated  at  .26  (SE 
=  .25  .  and  the  regression  of  Fatigue  2  on 
Anxiety  1  was  estimated  to  be  -.003  (SE  * 
.04:  Indeed,  Mode!  A5,  removing  the  cross- 
lagged  regressions  and  specifying  only  the  re¬ 
sidual  covariance  at  Time  2,  provided  an  ade¬ 
quate  fit  to  the  data  Thus  an  autoregressive 
model  can  fit  the  Nesselroade  et  al.  (1984) 
daM.  but  only  if  we  are  willing  to  accept  (1) 
isolated  stability  in  autoregression  and  (2)  the 
residual  covariance  as  theoretically  meaning¬ 
ful  specifications 

How  would  we  interpret  the  residual 
covariance’  In  structural  regression  analysis, 
it  is  common  to  argue  for  residual  covariances 
under  the  assumption  that  there  are  omitted 
causes  of  the  variables  in  the  model  that  are 
shared  between  the  variables  (thus  producing 
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the  residual  covariance).  In  fact.  Model  A5 
suggests  that  all  relevant  causes  of  Anxiety 
and  Fatigue  have  been  omitted,  excepting  of 
course  the  effect  of  each  variable  on  itself  as 
reflected  in  autoregression.  At  this  point, 
however,  it  seems  appropriate  to  question  the 
need  for  an  autoregressive  model  at  all  Thu 
issue  is  considered  further  below. 

To  summarize,  the  important  conclusion 
from  this  section  is  that  the  basic  cross-laggec 
regression  model  that  might  be  thought  of  as 
the  "standard”  approach  to  modeling  a  two 
wave-two  variable  problem  (e.g.,  Dwyer 
1983;  Rogosa.  1979)  cannot  account  for  th- 
stationary  covariance  structure  we  identified 
for  the  Nesselroade  et  al.  (1984)  data 

Alternative  Approaches  to  Modeling 
State  Phenomena 

If  covariance-structures  approaches  are 
be  used  to  model  flux  in  psychological  state? 
the  arguments  advanced  here  support  the 
need  to  examine  alternatives  to  conventional 
ways  of  fitting  autoregressiv  e  models  to  pant  1 
data 

A  different  longitudinal  panel  des if.  — 
One  could  argue  that  the  successful  model 
A5,  provides  an  important  suggestion  as  tc 
the  appropriate  method  for  modeling  cause  - 
of  mood  states  Given  the  salience  of  the  co-- 
related  residual  between  Anxiety  and  Fatig--- 
at  Time  2.  it  appears  that  mutual  causes  of  in-, 
two  mood  states  have  simply  been  omitted 
from  the  model.  The  obvious  suggestion 
then,  is  to  expand  the  mode!  to  include  the 
causes  of  the  mood  states  at  Time  2.  The  top 
panel  (a)  of  Figure  4  depicts  this  alternative 
model.  Given  the  standard  rationale  for  the 
autoregressive  coefficients  in  the  model, 
these  exogenous  causal  influences  determine 
change  in  mood  states  between  the  two  occa¬ 
sions.  One  could  also  expand  the  model  ti 
include  these  causes  at  Time  1  and  mode! 
their  stability  over  time  as  well. 

However,  our  consideration  of  the  dis¬ 
tinction  between  trait  and  state  variables  call- 
into  question  the  logic  of  assuming  temporal 
inertia  (stability  )  of  the  state  variables  ove- 
time.  If  the  concerns  regarding  autoregressw  t 
models  raised  by  Rogosa  and  colleagues  te  c 
Rogosa  6c  Willett,  1985j,  among  others,  an 
valid,  then  one  should  not  assume  that  the 
autoregressive  mode!  is  an  optimal  statistical 
method  for  measuring  change  In  that  case 
one  must  consider  whether  it  makes  sense  oi , 
logical  grounds  to  argue  for  temporal  inertia 
(stability )  for  state  variables.  If  not,  then  usage 
of  autoregressiv  e  models  would  appeaj  to  be 
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contraindicated.  Is  it  the  case  that  one’s  mood 
at  Time  I  directly  causes  one's  mood  at  Time 
2?  The  answer  to  this  kind  of  question  de¬ 
pends  upon  what  one’s  theory  about  psycho¬ 
logical  states  says  about  the  behavior  of  states 
in  the  time  interval  between  Times  1  and  2. 
In  the  case  of  mood  states,  if  the  interval  is  a 
matter  of  minutes,  the  ^t.haps  there  is  ap¬ 
preciable  inertia  If  the  time  interval  is  a  mat¬ 
ter  of  months  or  years,  then  inertia  per  se 
seems  unlikely.  Mood  states  could  be  cor¬ 
related  over  time,  but  probably  not  as  a  direct 
function  of  carryover  effects  from  moods  ex¬ 
perienced  some  months  prior. 

This  logical  analysis  leads  us  to  suggest 
that  an  entirely  different  class  of  models  may 
be  needed  for  pane!  designs  measunng 
changeable  phenomena  such  as  psychological 
states — namely.  designs  that  completely 
eliminate  autoregressis e  coefficients.  The 
lower  half  <b  of  Figure  4  depicts  an  example 
of  this  alternative  modeling  philosophy  The 
determinants  of  mood  states  are  modeled  as 
has  ing  concurrent  (simultaneous'  influences 
The  model  allows  for  autocorrelation  of  mood 
states  over  the  time  interval  but  only  as  a 
function  of  the  correlations  among  the  deter¬ 
minants  of  mood  across  time  One  could,  of 
course,  posit  and  mode!  autoregressive  rela¬ 
tionships  among  the  determinants  them¬ 
selves.  if  doing  so  were  justified  on  theoreti¬ 
cal  grounds 

This  class  of  mode!  has  actual!'  been 
evaluated  bv  Hargens.  Reskin,  and  Allison 
(1976  in  an  analysis  of  measurement  error  in 
panel  data  on  scientific  p-oducbvity  Analo¬ 
gous  to  the  results  reported  here.  Hargens  et 
a]  (1976  had  difficulty  fitting  a  first-order 
autoregressive  model  to  yearly  data  on 
scientific  productivity  (as  measured  by  vari¬ 
ables  such  as  the  number  of  publications  per 
year  Full  consideration  of  the  alternatives 
considered  bv  Hargens  et  al  ( 1976 )  would  be 
impossible  here,  but  a  citation  of  their  main 
conclusion  seems  appropnate. 

Recent  models  for  the  estimation  of  measurement 
error  from  panel  data  assume  a  lag-1  autoregressive 
in  the  true-score  variable  with  unco, related  distur¬ 
bances  Wt  believe  this  assumption  will  usually  be 
problematic  for  sociolopcal  variables  that  typically 
are  determined  by  other  variables  having  stability 
over  time  .  .  we  have  presented  a  model  [that]  .  .  . 
assumes  a  first-order  autoregressive  process  among 
the  disturbances  and  an  absence  of  any  lagged  ef¬ 
fects  in  the  true-score  variable  This  model  stems 
particular!'  appropnate  for  variables  like  scientific 
productivity,  which  must  be  created  or  produced 
anew  for  each  time  interval,  in  contrast  to  variables 
that  have  an  interna!  pnnciple  of  stability  (i.e  , 


Hertzog  and  Neaselroade  105 

which  tend  to  remain  the  same  unless  acted  upon 
from  without).  [P.  457] 

Clearly,  the  concerns  raised  by  Hargens 
et  al.  (1976)  extend  beyond  methods  of  es¬ 
timating  measurement  error  and  are  consis¬ 
tent  with  the  arguments  given  here  regarding 
the  utility  of  autoregressive  models  for  state 
variables. 

We  recommend  that  the  common  prac¬ 
tice  of  using  first-order  autoregressive  mode!- 
for  panel  data  be  preceded  by:  (I)  careful  lor 
cal  analysis  of  the  assumptions  of  tempar.f 
inertia  implied  by  this  type  of  model,  and  2 
consideration  of  alternative  models  such 
the  one  presented  in  Figure  4b  Where  t)  ■ 
endogenous  variables  in  the  panel  design  arc 
seen  primarily  as  transient  states,  determined 
by  concurrent  or  temporally  lagged,  situation- 
ally-specific  causes,  the  routine  application  r  l 
first-order  autoregressive  models  may  be  bod 
illogical  and  unw  ise 

Of  course,  psychological  variables  m.v 
contain  both  statelike  and  traitlike  comp  • 
nents.  In  such  cases,  a  pnon  consideration  c: 
the  existence  of  such  state  component-  a 
well  as  theorizing  about  possible  influence  o: 
transient,  statelike  influences  on  these  cor. 
ponents,  may  suggest  panel  designs  in  whicl 
these  influences  are  directly  measured  Tc 
scientists  such  as  ourselves,  such  state  comp'  • 
nents  might  be  of  central  interest  and  a  p-  - 
mary  focus  of  the  research  But  tht  Cv. 
oriented  scientist  would  be  well  advised 
under  such  circumstances,  to  identify  and  re¬ 
move  such  components  of  variance  from  tht 
"inertial"  endogenous  variables  of  intere-t 
This  adjustment  could,  in  theory-  be  actor.- 
plished  by  a  measurement  model  identifying 
the  multiple  components  and  their  detemv- 
nants  (as  in  the  muJtitrait-multimethod  dt 
sign;  Joreskog,  1974)  or  by  including  the  state 
antecedent  variables  in  the  model,  as  in  Fut¬ 
ure  4b,  but  retaining  the  autoregressive  pat: 
to  represent  the  traitlike  component  of  the 
psychological  variable.  Failure  to  account  fo: 
such  statelike  components  would  necessanh 
bias  statistical  estimates  of  causal  influence 
and  autoregressive  stability  . 

Modeling  states  at  the  intraindn  i du,  ' 
level — A  qualitatively  different  altemat,. 
has  been  in  the  literature  for  40  years  but  ha- 
recently  begun  to  receive  renewed  consider., 
tion,  namely,  structuring  the  flux  in  state  -  c 
rectly  at  the  intraindividual  level  Termed  i 
technique  by  Cattei!  (Cattell.  1955,  Catted 
Cattell.  6c  Rhymer,  1947),  this  approach  in¬ 
volves  collecting  data  by  assessing  multiple 
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attributes  of  one  individual  over  many  occa¬ 
sions  of  measurement  (see  also  Nesselroade, 
in  press,  Nesselroade  &  Ford,  1985). 

The  covariance  matrix  generated  by  P- 
technique  data  represents  die  covariation  of 
occasion-to-occasion  changes  in  different  at¬ 
tributes  of  the  individual.  It  can  be  analyzed 
by  confirmatory  factor  analysis.  Latent  vari¬ 
ables  that  are  identified  by  such  procedures 
manifest,  by  definition,  coherent  inbaindi- 
vidual  variability  (lability),  in  the  sense  that 
such  lability  is  consistent  over  multiple  indi¬ 
cators  or  the  latent  variable.  Under  some  cir¬ 
cumstances  (e.g.,  Cattell,  1966),  occasion-by- 
occasion  scores  on  these  latent  variables 
(factors  scores'  can  be  estimated  and  sub¬ 
jected  to  further  analysis. 

P-technique,  due  to  its  direct  focus  on  in- 
traindiv  idual  change,  provides  data  for  mod¬ 
eling  “steady -state"  variability  in  the  organ¬ 
ism  and  both  temporary  and  permanent 
changes  in  steady -state  variabihri.  By  com¬ 
bining  P-technique  with  the  group  design 
orientation  in  the  form  of  concurrent  P- 
technique  studies  of  several  individuals,  one 
can  capitalize  on  the  strengths  of  both  idiog- 
raphic  and  nomothetic  approaches  to  the 
studs  of  developmental  change  (Nesselroade 
6c  Ford.  1965.  Zevon  6t  Tellegen,  1962).  Ex¬ 
amination  of  interindmdual  similarities  and 
differences  in  the  characteristics  of  the  latent 
v  ariable'  provides  a  basis  for  answering  some 
important  questions  concerning  the  nature  of 
generaJizability  of  intraindividual  change  pat¬ 
terns  ov  er  the  facets  of  individuals  and  occa¬ 
sions  (Nesselroade,  1963  ■ 

Historically.  P-technique  data  have  been 
modeled  primarily  by  means  of  simple  factor- 
analysis  procedures.  Although  the  results  of 
such  analvses  have  proven  to  be  psychologi¬ 
cally  interesting  and  meaningful  (Cattell  6c 
Scheier,  1961,  Roberts  6t  Nesselroade,  in 
press,  Zevon  &  Tellegen,  1982),  the  practice 
has  been  criticized  (Anderson,  1961, 
Holtzman,  1962,  Molenaar,  1985)  because  it 
does  not  account  for  the  possibility  of  autocor¬ 
relations  of  variables  in  time  series  (some¬ 
times  termed  “nonindependence”  in  the 
time-senes  literature'.  However,  recent  de¬ 
velopments  (McArdle,  1982,  Molenaar.  19851 
appear  to  provide  the  means  for  treating  such 
statistical  problems.  It  is  our  hope  that  this 
class  of  models  for  single  subject  behavior 
will  enable  researchers  to  structure  the  in¬ 
traindividual  lability  inherent  in  states  as  a 
complementary  and  viable  alternative  to  the 
expanded  panel  designs  shown  in  Figures  4 a 
and  4b  for  modeling  intenndividual  varia¬ 


tions  in  state  variables.  If  within-penon  vari¬ 
ability  can  be  first  structured  at  the  individual 
(idiographic)  level  by  multivariate  analysis 
techniques  and  then  examined  for  between- 
person  differences  and  similarities,  it  opens  a 
promising  alternative  to  the  study  of  general - 
izability  across  individuals  and  the  construc¬ 
tion  of  nomothetic  relationships  (Nesselroade 
&  Ford,  1985;  Zevon  fit  Tellegen,  1982). 

Summary  and  Concluaion 

The  analysis  of  the  developmental  pro¬ 
cess  can  be  approached  in  a  number  of  ways 
Statistical  analysis  of  covariance  structures  is 
one  set  of  important  techniques  for  this  pur¬ 
pose,  as  this  special  issue  suggests.  In  thi- 
article  we  have  delineated  the  distinction  be¬ 
tween  trait  and  state  dimensions  and  its  im¬ 
plications  for  the  statistical  modeling  of  longi¬ 
tudinally  measured  behavioral  attributes 

First,  state  measures  behave  lawfully 
They  can  manifest  desirable  measuremen* 
properties  of  reliability  and  validity  whiie 
reflecting  a  considerable  amount  of  lability  of 
score  at  the  intraindividual  level  Such  labil¬ 
ity  runs  counter  to  conventional,  trait-onented 
conceptions  of  measurement  and  model'  of 
development.  However,  it  cannot  be  div 
missed  as  merely  "error  of  measurement  “ 

Second,  the  possibility  must  be  recog¬ 
nized  that  the  individual  differences  mea¬ 
sured  at  any  given  occasion  can  represent 
labile  characteristics  as  well  as  the  more  sta¬ 
ble,  traitlike  attributes.  The  failure  to  recog¬ 
nize  and  model  this  possibility  can  lead  to 
biased  estimates  of  the  parameters  of  traitlike 
attributes,  including  stability  of  the  latent 
construct  and  reliability  of  its  operational  ev- 
pressions,  thus  clouding  the  description  and 
interpretation  of  data  and  related  inferences 
about  the  nature  of  change 

Finally,  our  article  has  questioned  the  va¬ 
lidity  of  standard  autoregression  models  for 
change  in  psychological  states.  There  is  no 
doubt  that  autoregressive  models  will  con¬ 
tinue  to  fit  many  kinds  of  developmental  phe¬ 
nomena — namely,  development  of  psycho¬ 
logical  traits.  When  that  happy  circumstance 
occurs,  there  may  be  no  reason  to  downpla 
their  importance  as  descriptive  representati  r. 
of  a  temporal  process  (but  see  Rogosa  6c  WY 
left,  1985;. 

What  is  at  doubt  is  the  universal  validity, 
of  autoregressive  models  representing  change 
over  time  in  behavioral  data  We  have  argued 
that  dimensions  along  which  individual  du- 
ferences  are  displayed  are  not  homogeneous 


and  uniform.  Two  important  and  related  ways 
that  variables  differ  are.  (1)  temporal  charac¬ 
teristics,  and  (2)  antecedents  of  change.  The 
assumptions  regarding  these  dimensions  in¬ 
herent  in  traditional  autoregressive  models 
appear  to  be  more  applicable  to  variables 
characterized  by  high  stability  and  temporal 
inertia  (traits)  rather  than  variables  with  low 
stability  and  high  degree  of  situational  and 
temporal  specificity  (states). 

Explicit  recognition  of  the  differing  tem¬ 
poral  characteristics  implied  by  the  trait-state 
distinction  serves  to  warn  us  that  sole  reliance 
on  the  traditional,  trait-oriented  concepts  of 
differential  psychology  will  not  necessarily 
lead  to  an  accurate  portrayal  of  extant  differ¬ 
ences  among  individuals.  Rather,  an  under¬ 
standing  of  how  and  why  individuals  differ 
from  one  another  requires  attention  both  to 
dimensions  of  intraindividual  variability  and 
to  the  antecedents  of  mtraindividual  variabil¬ 
ity  Obviously,  to  account  for  both  the  tran¬ 
sient  and  the  more  stable  components  of  indi¬ 
vidual  differences  will  require  more  complex 
models  and  procedures  than  envisaged  in 
trait-onented  approaches  Nevertheless,  the 
difficulties  engendered  should  be  more  than 
offset  by  gains  in  our  understanding  of  devel¬ 
opmental  processes  Our  two  suggestions — 
first,  for  expanded  autoregressive  models  that 
include  state  antecedent  information  and  al¬ 
low  for  trait  components  to  be  estimated,  and 
second,  for  direct  modeling  of  intraindividuaJ 
change  using  single-subject  designs — are  of¬ 
fered  as  steps  tow  ard  increasing  our  capacity 
to  model  stability  and  change 
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We  address  two  questions  of  central  interest  in  adult  intellectual  development:  the  equivalence  of 
psychometric  tests'  measurement  properties  at  different  ages,  and  the  stability  of  individual  differences 
in  intelligence  over  time.  We  performed  a  senes  of  lonptudinal  factor  analyses  using  the  lisrel 
program  to  model  longitudinal  data  from  Schaie's  Seattle  Longiiudir  J  Study.  The  results  indicate 
complete  invariance  in  the  loadings  of  five  subtests  of  Thurstone's  Primary  Mental  Abilities  batter) 
on  a  general  intelligence  factor.  Individual  differences  in  general  intelligence  were  highly  stable  over 
14  year  epochs,  wrth  standardised  factor  correlations  averaging  about  9  between  adjacent  7-year 
testing  intervals  These  results  indicate  that  most  individuals  io  this  relatively  select  longitudinal 
sample  maintained  their  relative  ordering  in  intelligence. 


One  of  the  central  questions  in  adult  development  regards  the 
stability  of  adult  intelligence— does  intelligence  decline  with  age, 
and  if  so.  w  hat  is  the  magnitude  of  individual  differences  in  pat¬ 
terns  of  change  (c  g..  Botwinick.  1977;  Horn  &  Donaldson.  1980; 
Schaie.  1983)°  The  debate  in  the  literature  on  the  development 
of  intelligence  during  adulthood  has  focused  primarily  on  the 
stability  of  mean  levels  of  intelligence — is  there  indeed  decline, 
on  average,  on  different  intellectual  abilities,  and  if  so.  what  is 
the  magnitude  of  such  decline  (e  g.,  Baltes  &  Schaie.  1976;  Horn 
A  Donaldson.  1976.  Schaie  &  Hertzog,  1 983)?  The  attention 
paid  to  stability  of  mean  levels  of  intelligence  has  perhaps  diverted 
the  held  from  focusing  on  a  different  critical — and  in  some  senses 
more  critical— type  of  stability;  stability  of  individual  differences 
in  intelligence.  How  large  are  individual  differences  in  magnitudes 
of  age  changes  in  intelligence  during  the  adult  years’1  Some  de¬ 
velopmental  psychologists  have  suggested  that  adult  development 
is  characterized  by  increasing  heterogeneity  and  by  substantial 
individual  differences  in  patterns  of  age  change  in  intelligence 
and  oiher  cognitive  capacities  and  skills  (e  g..  Baltes,  Ditlmann- 
kohli.  &  Duon.  1984.  Hertzog.  1985;  Schaie,  1983)  Enhance¬ 
ment  of  or -mal  intellectual  development  through  intervention 
(e  g  .  Schaie  St  Willis.  1986)  requires  as  a  first  step  the  identifi- 


This  article  reports  data  collected  as  part  of  the  Seattle  Longitudinal 
Study  which  has  been  supported  over  an  extended  period  of  time  by 
grants  from  the  National  Institutes  of  Health,  the  National  Institute  for 
Child  and  Human  Development,  and  the  National  Institute  on  Aging 
Our  work  is  cunem!)  supported  by  Grant  R0I-AG4770  from  the  Nauonal 
Institute  on  Aging 

Our  thanks  lo  William  Meredith  for  advice  and  commenu  on  our 
statistical  models  and  results,  and  to  an  anonymous  reviewer  for  helpful 
editorial  suggestions  The  cooperation  and  support  from  members  and 
tuff  of  the  Group  Health  Cooperative  of  Project  Sound  is  gratefully  ac¬ 
knowledged 
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vtprail v  FSrV  (Wtin*v(v4ni)  IA8A7 


cation  of  differential  patterns  of  aging  and  the  isolation  of  the 
causes  of  such  differences. 

Measuring  stability  of  individual  differences  in  intelligence  is 
somewhat  more  complex  than  measuring  mean  level  stability 
Although  sequential  sampling  strategies  using  repeated,  inde¬ 
pendent  cross-sectional  samples  can  be  used  to  assess  mean  level 
stability  (e.g.,  Schaie,  1977;  Schaie  &  Hertzog  1982),  stability 
of  individual  differences  can  only  be  addressed  by  following  in¬ 
dividuals  in  a  longitudinal  panel  design.  Cross-sectional  designs 
can  only  measure  magnitudes  of  individual  differences — as  in¬ 
dicated  by  the  variances — at  a  single  point  in  time  At  any  given 
point  in  time,  individual  differences  can  be  conceptualized  as 
being  determined  by  an  earlier  individual  differences  distribution 
and  by  subsequent  individual  differences  in  developmental  change 
(see  Baltes.  Reese.  St  Nesselroade.  1977)  Only  a  longitudinal 
design,  by  directly  measuring  change  al  the  level  of  the  individual, 
can  be  used  to  estimate  the  proportion  of  individual  differences 
due  to  individual  differences  in  change  during  preceding  lime 
periods  (see  Hertzog.  1985;  Nesselroade  &  Labouvie.  1985; 
Schaie  &  Hertzog  1985). 

This  study  was  designed  to  provide  a  careful  and  detailed  ex¬ 
amination  of  individual  differences  in  intellectual  change  during 
adulthood.  It  also  focuses  on  a  second,  critical  issue  identified 
by  developmental  methodologists  regarding  the  assessment  of 
change  over  lime  in  variables  such  as  intelligence.  The  issue  is 
whether  the  constructs  under  study,  and  the  measures  of  those 
constructs,  are  actually  isomorphic  at  different  ages  Can  we  as¬ 
sume  that  inteilicence  is  the  same  construct  at  ages  25  and  75'1 
Even  if  intelligence  is  unchanging  or  continuous  (Kagan.  1980) 
across  the  adult  life  span,  is  it  the  case  that  psychometric  measures 
of  intelligence  are  equally  reliable  and  valid  as  measures  of  in¬ 
telligence  at  different  ages?  Baltes  and  Nesselroade  (1970)  iden¬ 
tified  this  issue  as  one  of  measurement  equivalence — can  we  as¬ 
sume  invariant  measurement  properties  of  empirical  measures 
at  different  parts  of  the  life  span  (see  also  Eckensberger.  1973)7 
As  Baltes  and  Nesselroade  indicated  (see  also  Schaie.  1977;  Schaie 
St  Hertzog  1985),  the  optimal  method  for  assessing  measurement 
equivalence  is  comparative  factor  analysis,  in  which  the  invari- 
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ance  of  the  factor  structure  of  the  psychometric  abilities  is  as¬ 
sessed  As  discussed  elsewhere  (e.g.,  Cunningham.  1978;  Schaie 
A  Hertzog.  1982.  1985).  the  best  approach  to  the  invariance 
problem  involves  the  use  of  confirmatory  factor  analytic  methods 
to  test  the  hypothesis  of  age-related  invariance  in  the  factor 
structure. 

This  is  the  first  in  a  series  of  articles  describing  our  use  of 
covariance  structures  methods  to  analyze  patterns  of  change  and 
stability  in  adult  intelligence  using  data  from  Schaie's  Seattle 
Longitudinal  Study  (SLS)  In  this  article  we  describe  results  from 
a  longitudinal  factor  model  that  may  be  used  to  assess  (a)  the 
measurement  equivalence  of  the  Thurstone  Primary  Mental 
Abilities  battery  used  in  the  SLS  and  (b)  the  extent  to  which 
individuals  in  the  SLS  vary  in  patterns  of  intellectual  change 
during  the  adult  years.  The  Primary  Mental  Abilities  test  was 
developed  by  Thurstone  and  Thurstone  (1941,  1949)  to  measure 
factorially  pure,  but  mtercorrelated,  intellectual  abilities.  As¬ 
sessment  of  factorial  invariance  and  stability  of  individuals  with 
the  Primary  Mental  Abilities  is  particularly  relevant,  given  the 
influence  of  Thurstone's  work  on  the  field  of  psychometric  in¬ 
telligence.  Our  findings  strongly  support  the  measurement 
equivalence  of  the  Thurstone  battery  across  much  of  the  adult 
life  span  We  also  show  that  there  is  a  surprising  degree  of  stability 
of  individual  differences  in  intelligence  in  participants  from  the 
kind  of  long-term  longitudinal  sample  obtained  in  the  SLS. 

Our  conclusions  are  based  on  results  from  a  set  of  relatively 
complex  longitudinal  covariance  structures  models  of  the  type 
developed  by  Joreskog  and  co-workers(e  g  .  Joreskog  A  Sorbom, 

1 077)  The  longitudinal  factor  model  developed  by  Joreskog  and 
others  (Joreskog.  1979.  Joreskog  A  Sorbom,  1977)  may  be  viewed 
as  a  generalization  of  other  longitudinal  factor  analysis  (e  g . 
models  by  Corballis.  1973,  Corballis  A  Traub,  i 970).  To  set  the 
stage  for  our  report,  we  must  first  summarize  the  methodological 
features  of  these  models  and  how  their  parameters  may  be  used 
to  assess  stability  and  change  in  individual  differences  over  time 
(see  also  Hertzog.  in  press:  Horn  A  McArdle,  1980;  Schaie  A 
Hertzog.  1985). 

Let  us  assume  that  an  investigator  has  collected  multiple  mea¬ 
sures  of  one  or  more  latent  variables  in  a  longitudinal  design 
The  measures  may  or  may  not  be  identical  at  each  longitudinal 
measurement  occasion,  although  in  the  SLS  the  same  measures 
were  collected  at  each  time  of  measurement.  The  relations  among 
these  variables  must  be  represented  by  the  covariance  matrix  of 
the  observed  variables  (a  correlation  matrix  should  not  be  ana¬ 
lyzed.  Joreskog  A  Sorbom.  1977).  Given  this  kind  of  replicated 
longitudinal  design,  confirmatory  factor  analysis  may  be  used  to 
specify  and  estimate  a  longitudinal  factor  model  with  the  follow¬ 
ing  features 

First,  the  same  factor  structure  is  hypothesized  toexisl  at  each 
longitudinal  measurement  occasion.  This  structure  is  represented 
in  the  factor  pattern  matrix,  which  contains  the  regression  coef¬ 
ficients  mapping  variables  on  factors  (factor  loadings).  In  the 
analysis  we  report  here,  a  general  intelligence  (g)  factor  was  mod¬ 
eled  at  each  longitudinal  occasion.  The  factors  thus  specified  in 
a  longitudinal  factor  model  are  often  termed  occasion-specific 
factors 1  In  addition  to  the  factor  pattern  matrix,  the  basic  lon¬ 
gitudinal  model  includes  a  factor  covariance  matrix,  describing 
the  relations  among  the  factors  within  and  between  longitudinal 
occasions,  and  a  residual  covariance  matrix.  The  primary  pa¬ 
rameters  of  interest  are  the  factor  loadings  and  the  (actor  co¬ 


variance  matrix.  The  first  step  involves  evaluation  of  the  mea¬ 
surement  equivalence  of  the  observed  variables  Measurement 
equivalence  may  be  assessed  by  (a)  evaluating  the  adequacy  of 
the  model  postulating  isomorphic  occasion-specific  factors  n  e  . 
the  same  number  of  factors  with  the  same  configuration  of  factor 
loadings  at  each  longitudinal  occasion)  and  (b)  determining  the 
plausibility  of  a  model  constraining  these  factor  loadings  to  be 
equal  (invariant)  over  all  longitudinal  occasions.  These  factor 
loadings  are  raw-score  (unstandardized)  regression  coefficients, 
and  invariance  of  these  coefficients  (sometimes  termed  metric 
invariance;  see  Horn,  McArdle.  A  Mason,  1984)  implies  un¬ 
changing  relations  of  the  observed  variables  to  the  factors  (Mer¬ 
edith,  1964;  Schaie  &  Hertzog.  1985).  Procedures  for  assessing 
the  fit  of  these  models  are  described  later  in  the  anicle. 

Given  that  the  hypothesis  of  measurement  equivalence  is  ten¬ 
able,  the  second  step  in  the  longitudinal  analysis  shifts  attention 
to  the  factor  covariance  matrix.  The  diagonal  elements  of  this 
matrix — the  factor  variances — reflect  the  magnitude  of  individual 
differences  at  each  longitudinal  occasion.  Changes  in  factor  vari¬ 
ances  would  therefore  reflect  changes  in  the  overall  magnitude 
of  individual  differences  over  time  The  stability  of  individual 
differences  across  longitudinal  occasions  is  reflected  in  the  co- 
variances  of  factors  with  themselves  over  time  If  the  covariance 
of  a  factor  at  Time  I  with  itself  at  Time  2  is  large  and  positive, 
then  individuals  are  preserving  their  relative  order  about  the  factor 
mean  between  Times  I  and  2.  On  the  other  hand,  a  zero  or  near 
zero  covariance  would  reflect  a  high  degree  of  flux  in  individual 
differences  between  Times  I  and  2  As  shown  hv  Balles,  Reese 
and  Nesselroade  (1977),  a  zero  covariance  would  be  consistent 
with  large  ir.  al  differences  in  the  patterns  of  developmental 
change  during  that  time  period 

Given  that  the  SLS  is  a  sequential  studv,  in  which  multiple 
longitudinal  samples  have  been  followed  over  time  (see  Schaie. 
1979.  1983).  it  is  possible  to  expand  the  longitudinal  model  to 
consider  longitudinal  changes  in  muluple  age  groups  The  ex¬ 
tension  of  the  model  to  muluple  group  analysis  has  been  described 
by  Joreskog  and  Sorbom  ( 1980).  and  is  relatively  straightforward 
The  advantage  of  a  multiple  groups  analysis  in  the  present  context 
is  that  it  allows  us  to  address  the  issue  of  age  invariance  in  factor 
structure  both  longitudinally,  within  a  group  of  individuals,  and 
comparatively,  across  multiple  age  groups  The  longitudinal 
samples  we  analyze  include  adults  of  a  wide  span  of  chronological 
ages  who  have  been  tested  three  times  over  a  14-year  period 
These  multiple  samples  allow-  us  to  examine  longitudinal  in¬ 
variance  in  factor  structure  over  14-year  epochs,  while  also  ex- 
amin  ng  factorial  invariance  over  the  adult  life-span  by  comparing 
the  factor  structures  of  multiple  age  groups 

Method 

Subjects 

The  subiects  in  this  study  were  participants  in  the  Seattle  Longitudinal 
Study  conducted  by  Schaie  and  associates  (Schaie,  1979,  1983)  The  pop¬ 
ulation  consisted  of  members  of  a  health  maintenance  organization 
(HMO)  m  the  greater  Seattle.  Washington,  area  To  minimize  the  prob- 


1  The  model  can  be  extended  without  difficulty  to  include  different 
numbers  of  common  factors  al  each  longitudinal  occasion,  but  that  ap¬ 
proach  is  unnecessary  in  our  analysis 
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ability  of  selection  differences  over  time,  the  population  was  defined  as 
all  members  of  the  organisation  as  of  1956.  the  initial  year  of  the  longi¬ 
tudinal  study.  All  participants  were  unpaid  volunteers  who  answered 
questionnaires  and  took  part  in  a  psychometric  testing  conducted  in  a 
single  session.  The  volunteers  were  recruited  from  a  randomly  drawn 
sampling  frame  of  the  HMO  membership,  stratified  by  age  and  gender. 
T  he  participants  were  adults  spanning  the  age  range  from  20  through  74, 
at  first  lest,  and  representing  a  range  of  socioeconomic  and  ethnic  groups. 
However,  probability  sampling  was  nor  employed,  and  the  sample  was 
therefore  not  necessarily  representative  of  the  entire  HMO  population 
As  w-as  generally  true  of  the  Seattle  population  circa  1956,  the  sample  is 
predominantly  Caucasian  and.  reflecting  the  membership  of  the  HMO, 
contains  a  higher  proportion  of  middle-  and  upper-income  individuals 
than  did  the  total  Seattle  population.  Further  derails  on  the  population 
and  sampling  procedures  may  be  found  in  Schaic  (1979.  1983). 

Sequential  Sampling  Design 

The  longitudinal  samples  studied  here  are  3  subset  of  the  sequential 
samples  collected  in  the  SLS  Briefly,  the  design  of  the  SLS  consisted  of 
repealed  sampling  from  the  population  at  7. year  intervals,  beginning  in 
1956  and  continuing  through  1984  Each  year  of  testing,  a  new  cross- 
sea  local  sample  was  drawn  from  the  population,  and  ail  previously  tested 
individual  were  contacted  and  recruited  for  participation  in  the  longi¬ 
tudinal  panel  Thus,  each  independent  cross-sectional  sample  was  trans- 
t  'rmcd  into  a  multiple-cohort  longitudinal  sequence  (Baites  el  al .  1977) 
r\  repeated  testing  of  the  same  individuals.  We  restrict  our  analysis  here 
to  two  i4-\ear  longitudinal  samples  Sample  1  consists  of  162  subjects 
tested  m  WXA.  and  19"0.  and  Sample  2.  250  subjects  tested  in 
tuist  |9"o  and  1977  The  data  from  the  two  longitudinal  sequences 
were  part;t'''ned  into  a  hvhrid  sequential  data  matrix  given  in  Table  I 
Ih’v  partition  created  three  age  groups  (young.  middle  aged,  and  old) 
Avr  simultaneous  analws  These  age  groups  were  formed  under  the  as¬ 
sumption  of  no  cohort  differences  in  factor  structure  Although  it  would 
have  been  desirable  to  test  for  both  age-reiated  and  cohort-related  mea¬ 
surement  equivalence,  sampie  sires  were  insufficient  for  such  purposes 
Age-reiated  changes  in  factor  structure  seemed  more  likely,  a  prion,  and 
earlier  work  supported  the  assumption  of  no  cohort  differences  in  factor 
M-urturc  'Cunningham  A.  Birrcn.  1980)  As  can  be  seen  from  Table  I. 
data  from  ditferer.i  birth  cohorts  were  pooled  to  obtain  the  age  groups 

I  aritihU't 

As  ran  oi  larger  pxvchometric  batters,  all  of  the  subjects  were  ad¬ 
ministered  if-  I  948  version  A  ihe  SR  A  (Science  Research  Associates) 
Pnmarv  Mental  Abilities  /  PM A'  test.  Form  AM  11-17  (Thurstone  & 
Ihursmne.  l94vi  T  he  I Q48  PM  A  includes  five  subtests,  all  of  which  are 
timed  and  have  significant  speed  components  in  adult  samples  (Schaie. 
Rosenthal  &.  Perlman  I9<?i  They  are  (a)  Verbal  Meaning — a  test  of 
recognition  vocabularv.  (h)  Space— a  lest  of  spatial  orientation  requiring 
mental  rotation  in  a  two-dimensional  plane.  <c)  Reasoning — a  test  of 
inductive  reasoning  requiring  recognition  and  extrapolation  of  patterns 
of  letter  sequences  idi  Number— a  test  of  the  ability  to  solve  simple  two- 
column  addition  problems  quickly  and  accurately,  and  <e)  Word 
Fluency— a  test  nl  the  ability  to  retrieve  words  from  semantic  memory 
according  to  an  arbitrary  syntactic  rule.  Scoring  protocols  followed  the 
PM  A  manual  Verbal  Meaning  and  Reasoning  are  scored  in  terms  of  the 
number  of  correct  responses  Space  and  Number  are  scored  by  subtracting 
commission  errors  from  the  total  number  correct;  and  Word  Fluency  is 
scored  by  tallying  the  total  of  unique,  admissible  words  generated 

Statistical  Procedures 

All  of  the  models  described  were  tested  using  the  LlSREL  v  program 
of  Joreskog  and  Sorbom  (1981)  The  analyses  reported  in  this  article 


Table  I 

Rvparameierized  Sequential  Sample  tor 
Multiple  Group  Analysis 


Sample 

Cohort 

(mean  birth  year) 

Mean  age 

n 

0, 

Oi 

o, 

Group  1 

30. 

37. 

44 

109 

1 

1931 

25. 

32. 

39 

21 

1 

1924 

32. 

39. 

46 

26 

2 

1938 

25. 

32. 

39 

22 

2 

1931 

32. 

39. 

46 

40 

Group  2 

42. 

49. 

56 

160 

| 

1917 

39. 

46. 

53 

27 

1 

1910 

46. 

53. 

60 

32 

2 

1924 

39. 

46. 

53 

51 

2 

1917 

46. 

53. 

60 

50 

Group  3 

58. 

65. 

12 

143 

1 

1903 

53. 

60. 

67 

28 

\ 

1896 

60. 

67. 

74 

15 

1 

1889 

6'. 

74. 

81 

n 

2 

1910 

53. 

60. 

67 

48 

2 

1903 

60. 

67. 

74 

18 

2 

1896 

67. 

74. 

81 

21 

A oie  Oi  =  first  occasion  of  measurement.  0.  -  second  occasion  of  mea¬ 
surement.  03  =  third  occasion  of  measurement 


used  only  one  of  l  iSREL's  two-factor  analysis  measurement  models  In 
LlSREL  notation,  the  measurement  mode)  may  be  specified  as 

x  =  A f  +  6.  ( I ) 

which  in  matrix  form  specifies  a  tf-ordcr  vector  of  observed  variables,  x. 
as  a  function  of  their  regression  on  n  latent  variables  I  factors!  in  $.  with 
regression  residuals  6.  The  q  <  n  matrix  \  contains  the  regression  coef¬ 
ficients  (factor  loadings).  Equation  l  implies  that  the  covariance  matrix 
of  the  observed  variables  in  the  populations.  I.  may  be  expressed  as 

I  =  A  ♦A'  +  O.  (2) 

where  A  is  as  before.  <t>  is  the  covariance  matrix  of  the  £.  and  R  r  the 
covariance  matrix  of  the  6  Equation  2  is  a  restricted  factor  analysis  model 
that  can  be  extended  to  multiple  groups  (Joreskog  19*1) 

The  parameters  of  HSREi’s  restricted  factor  analysis  model  arc  esti¬ 
mated  by  the  method  of  maximum  likelihood,  provided  that  a  unique 
solution  to  the  parameters  has  been  defined  hv  placing  a  sufficient  number 
of  restrictions  on  the  equations  in  Fquation  2  to  identify  the  remaining 
unknowns  Restrictions  are  specified  by  either  tai  fixing  parameters  to  a 
known  value  a  priori  (eg.  requiring  that  a  variable  is  unrelated  to  a 
factor  by  fixing  its  regression  in  A  to  0)  or  lb)  constraining  a  set  of  two 
or  more  parameters  to  be  equal.  The  equality  constraints  mav  he  applied 
to  any  subset  of  parameters  within  or  between  groups,  which  provides 
the  basis  for  specifying  a  model  requiring  invariant  factor  structures  be¬ 
tween  multiple  groups  or  across  longitudinal  occasions  (as  needed,  for 
example,  to  lest  the  hypothesis  of  measurement  equivalence)  Overiden- 
ufied  models  (w-hicb  have  more  restrictions  than  are  necessary  to  identify 
the  model  parameters)  place  restrictions  on  the  hypothesized  form  of  1, 
which  may  be  used  to  test  the  goodness  of  fit  of  the  model  to  the  data 
using  the  likelihood  test  statistic.  Differences  in  chi-square  between  nested 
models  (models  that  have  the  same  specification,  with  additional  restric¬ 
tions  in  one  model)  may  be  used  to  test  the  null  hypothesis  that  the 
restrictions  (e  g.,  constrained  equal  factor  loadings)  are  true  in  the  pop¬ 
ulation. 

In  multiple  group,  longitudinal  facioranalysis.  it  is  necessary  toestimate 
factor  models  using  covariance  metric  and  sample  covariance  matrices 
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rimer  ihan  to  analyze  >eparaiely  standardized  correlation  matrices 
Standardization  could  obscure  invariant  factor  structures  because  of  group 
differences  tn  observed  variances  (Joreskog.  1971),  and  would  not  allow 
evaluation  of  longitudinal  changes  in  factor  variances.  To  estimate  raw 
score  factor  pattern  weights  and  factor  variances,  one  must  identify  the 
metric  of  the  factors  b>  fixing  a  single  regression  in  each  column  of  A  to 
a  constant  (conveniently.  I  0).  and  then  interpret  results  while  considering 
the  metric  of  latent  and  observed  variables.  The  analyses  reported  here 
do  so  Nevertheless,  as  standardized  factor  ktadinp  (etc.)  are  easier  to 
interpret,  we  provide  parameter  estimates  that  have  been  rescaled  to  a 
quasi-standardized  metric,  using  a  saJ  pkoc  matrix  program  for  scaling 
longitudinal  factor  analyses  !  This  rescaling  preserves  longitudinal  con¬ 
straints  on  parameter  estimates  but  returns  scaled  values  for  factor  loadinp 
that  are  similar  to  standardized  factor  loadings.  We  also  report  maximum 
likelihood  estimates  and  standard  errors  for  certain  models  so  that  the 
reader  may  evaluate  (a)  a  null  hypothesis  that  each  parameter  is  equal  to 
zero,  or  (b)  that  group  differences  in  unconstrained  parameters  are  sta¬ 
tistically  reliable  In  general,  parameters  that  exceed  their  standard  errors 
by  a  ratio  of  2:1  are  reliably  different  from  zero  at  a  5%  (per  comparison) 
alpha  level 

Results 

The  longitudinal  models  wc  estimate  are  designed  to  test  the 
properties  of  the  second-order  general  intelligence  factor  ( g )  from 
the  PM  A  identified  by  Thurstone  and  Thurstone  ( 1941 ).  A  first 
step  was  to  determine  that  the  g  factor  was  an  adequate  repre¬ 
sentation  of  the  covariance  structure  of  the  five  PMA  subtests. 
Bechtoldt  (1974)  and  Corballis  and  Traub  (1970)  worked  with  a 
two-factor  representation  of  the  PMA  subtests,  although  Bech- 
toldi's  work  included  an  additional  memory  variable  that  was 
not  included  in  the  1948  PMA.  and  Corballis  and  Traub's  two- 
factor  model  appeared  to  produce  a  very  weak  second  factor. 
Nevertheless,  we  considered  it  necessary  to  evaluate  the  suffi¬ 
ciency  of  the  g  factor  model  before  proceeding  to  longitudinal 
analysis  To  do  so.  we  used  an  exploratory  factor  analysis  of  all 
first-occasion  cross-sectional  data  from  the  SLS  (A"  =  2,202)  to 
estimate  an  unrestricted  maximum  likelihood  factor  solution. 
1  he  results  for  the  one-factor  model  clearly  indicated  that  the  g 
factor  sufficiently  accounted  for  the  covariance  structure,  xJ(5, 
V  =  2.202)  =  6 .18.  P  <  .25;  Tucker-Lewis  reliability  =  .997. 

Z  onguudieal  Mode!  Sample  I 

Prior  to  analyzing  the  muluple  age  groups,  we  first  analyzed 
the  longitudinal  factor  model  for  the  entire  Sample  1 .  This  anal¬ 
ysts  permuted  us  to  evaluate  the  structural  model  prior  to  en¬ 
gaging  in  the  more  complex  muluple  group  models  reported 
later  in  the  article  The  basic  occasion-specific  model  is  depicted 
in  Figure  1.  The  g  factor  was  specified  at  each  longitudinal  oc¬ 
casion  The  metric  of  g  was  defined  by  fixing  the  loading  of 
Reasoning  on  g  to  1.0.  The  remaining  four  factor  loadings  at 
each  occasion  were  freely  estimated,  but  were  constrained  to  be 
equal  across  longitudinal  occasions.  By  design,  the  loadings  of 
all  of  the  other  variables  (e.g.  Verbal  Meaning  at  Time  3  on  g 
at  Time  I )  were  fixed  at  0  The  factor  covariance  matrix  was 
freely  estimated,  and  the  residual  covariance  matrix  was  specified 
as  a  diagonal  matrix  of  unique  variances. 

We  hypothesized  in  advance  that  this  model  would  not  fit  the 
data  because  of  the  diagonal  specification  for  the  residual  co¬ 
variance  matrix.  It  is  well-known  that  longitudinal  factor  models 
of  the  type  we  are  working  with  are  likely  to  require  what  has 


been  termed  auiocorrelated  residuals  (Sorbom.  1975.  Wiley  &. 
Wiley.  1970).  That  is,  given  that  it  is  likely  that  the  occasion- 
specific  factors  will  not  account  for  all  the  reliable  variance  in 
the  observed  variables,  then  it  is  plausible  to  expect  that  the 
residuals  (specific  components)  for  an  observed  variable  will  cor¬ 
relate  over  time.  In  other  words,  we  expected  a  residual  covariance 
between  the  residual  for  Verbal  Meaning  at  Time  I  and  the  Verbal 
Meaning  residual  at  Time  2,  a  residua)  covariance  between  the 
Time  i  Space  residual  and  the  Time  2  Space  residual,  and  so 
on.  This  residual  pattern  was  especially  likely,  given  that  we  are 
estimating  a  second-order  g  factor,  as  in  this  case  the  residual 
will  include  variance  in  the  primary  ability  not  accounted  for 
by  g  In  fact,  one  would  expect  from  the  literature  on  abilities 
that  the  communalities  for  variables  like  Space  and  Number 
determined  by  g  would  be  relatively  small. 

The  initial  model,  denoted  0, ,  specifying  a  diagonal  matrix 
of  unique  variances  provided  an  exceptionally  poor  fit  to  the 
data  (see  Table  2).  The  poor  fit  was  underscored  by  the  fact  that 
the  estimated  factor  covariances  were  greater  than  the  corre¬ 
sponding  factor  variances  (which  implies  the  logical  absurdity 
of  correlations  greater  than  I).  We  therefore  estimated  Model 
0j ,  specifying  autocorrelated  residuals  in  the  residual  covariance 
matrix.  The  imptovement  in  fit  was  substantial,  change  in  xJ(  1 5. 
N  =  162)  =  898.64,  p  <  .001.  Indeed,  the  overall  chi-square  test 
statistic  was  no  longer  significant,  and  the  normed  fit  index  was 
.96,  indicating  that  nearly  all  the  covariance  in  the  sample  data 
matrix  was  accounted  for  by  the  model. 

At  this  point,  our  interest  shifted  to  testing  hypotheses  re¬ 
garding  cross-occasion  invanance  in  the  parameter  matrices  The 
principal  hypothesis  of  interest  with  respect  to  measurement 
equivalence  involved  the  invariance  of  the  raw-score  factor  pat¬ 
tern  weights  (factor  loadings)  in  A.  Model  0,  relaxed  the  con¬ 
straint  that  the  factor  pattern  weights  be  equal  across  occasions 
The  difference  in  fit  was  nonsignificant,  indicating  that  the  hy¬ 
pothesis  of  equal  weights  could  not  be  rejected 

Given  invariant  factor  pattern  weights,  it  w-as  meaningful  to 
ask  whether  the  factor  variances  were  stationary  over  time,  in¬ 
dicating  consistency  in  the  magnitude  of  individual  differences 
on  g.  Model  0.  tested  this  hypothesis  by  constraining  the  diagonal 
elements  of  the  factor  covariance  matrix  to  be  equal  across  lon¬ 
gitudinal  occasions.  This  hypothesis  was  reje.-.cd  (sec  Table  2). 
Thus  we  concluded  that  there  were  changes  in  the  magnitude  of 
individual  differences  over  occasions  We  were  also  able  to  reject 
the  null  hypothes.s  that  the  factor  covariances  were  equal  (see 
Model  0j  of  Table  2). 

Next,  our  attention  turned  to  the  parameters  in  the  residual 
covariance  matrix  Our  first  hypothesis  was  that  the  residual  co¬ 
variances  could  be  constrained  equal  over  occasions.  This  hy¬ 
pothesis,  if  tenable,  would  suggest  a  high  degree  of  stability  of 
individual  differences  in  the  ability-specific  residual  components. 
As  can  be  seen  in  Table  2,  Model  0*.  imposing  the  equality  con¬ 
straints  on  the  residual  covariances,  did  not  fit  worse  than  the 
Model  0j,  indicating  that  the  hypothesis  of  equal  covariances 


1  Briefly,  the  scaling  is  accomplished  by  pooling  estimated  latent  vari¬ 
ances  and  estimated  observed  variances  to  obtain  scaling  matrices.  Fooling 
is  done  over  multiple  groups,  as  in  JSreskog  (I97|).  and  also  over  lon¬ 
gitudinal  occasions  A  set  of  scaling  equauons  and  a  listing  of  the  scaling 
program  is  available  from  the  first  author  on  request 
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Figure  I  Initial  longitudinal  factor  model  specifying  general  intelligence  factor  (j?)  at  each  of  three  longitudinal 
occasions  (Subsequent  models  include  covariances  among  corresponding  residuals  [e  g..  4,.  4,.  4,,]  over 
time  i 


could  not  be  rejected  Finally  we  tested  the  hypothesis  of  lon¬ 
gitudinal  invariance  in  the  residual  variances.  This  hypothesis 
stipulates  that  longitudinal  changes  in  the  variances  of  the  ob¬ 
served  variables  could  be  attributed  to  changes  in  g  factor  vari¬ 
ance  aione  This  model,  labeled  Or  in  Table  2.  was  rejected  as 
an  equivalent  representation  to  Model  0».  We  concluded  that 
there  were  occasion-specific  differences  in  the  unique  variances 
as  well  as  tr  the  factor  variances. 

The  facte r  loadings  their  associated  standard  errors  of  the  ac¬ 


cepted  model  (Oft)  are  given  in  Table  3  All  factor  loadings  are 
significant,  but  the  rescaled  factor  loadings  for  Verbal  Meaning 
and  Reasoning  are  clearly  larger  than  the  rest.  This  pattern  is 
consistent  with  the  factor  analytic  literature  on  second-order 
ability  factors  (e  g..  Korn.  1978).  and  parallels  the  findings  of 
Thurstone  and  Thurstonr  '  I  °4 1  j. 

This  pattern  is  also  reflected  in  the  standardized  residual  vari¬ 
ances.  where  the  smallest  residuals  (largest  communalittesi  are 
associated  with  Verbal  Meaning  and  Reasoning.  Note  also  the 


Table  2 

Statistic <  Mr  Alternative  Longitudinal  Models 


Model 

df 

P 

n* 

Comparison 

Ax’ 

±df 

r 

0,i  .\,  =  ,b  diag  f* i 

98?  84 

95 

.000 

.574 

_ 

nrl  ".  C ™  O') 

r  :o 

80 

.27 

.962 

0,-0, 

898  64 

15 

<  001 

.388 

*) 

82.98 

72 

.17 

.964 

0,-0, 

4.22 

8 

ns 

.002 

0,1  A,  »,  diag  *,  =*> 

1 12  90 

82 

.013 

.951 

0,-0, 

25  70 

7 

i. 

<  001 

Oil 

0,1  A,  -  ♦,  -') 

121.78 

84 

.005 

.947 

0,-0. 

8  88 

2 

<  05 

004 

U*(  A,  *,  cov  0  -*) 

97.16 

90 

.28 

.958 

0|-0j 

9  96 

10 

ns 

.004 

04 a,  -  e.  -‘) 

129.21 

100 

.026 

.944 

Or-O, 

3205 

10 

<  05 

.014 

*  Bentler-Bonett  normed  fit  index 

*  Indicates  nonzero  factor  pattern  weights  in  A  constrained  to  be  equal  over  Ume  (t). 

'  Indicates  the  residuals  in  8  specified  as  urcorrelated  (see  test). 

*  Indicates  autocorrelated  residuals  in  0  Thu  specihcauon  was  continued  in  Models  Oj-O,.  as  wet). 

'  Indicates  factor  variances  in  *  constrained  to  be  equal  over  ume 

'Indicates  factor  covariances  constrained  equal  and  factor  variances  constrained  equal  over  ume. 

*  Indicates  covariances  among  residuals  constrained  equal  over  Ume 

'  indicates  residual  variances  constrained  equal  ewer  tune,  and  residual  cmanances  constrained  equal  over  ume 


i 
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Table  3 

Factor  Loadings  and  Residual  Variances  for  the  Longitudinal  Factor  Model  (06) 


Test 

Factor  loadings 

Residual  variances* 

USREL  estimates* 

Rescaled  loadings 

Time  1 

Time  2 

Time  3 

Verbal  Meaning 

1.540(0  100) 

.838 

.318 

.348 

.240 

Space 

0  994  (0.109) 

.556 

.751 

.666 

.652 

Reasoning 

1.00-  (-) 

.878 

.269 

.274 

.162 

Number 

0.928  (0  108) 

.518 

.760 

.763 

.614 

Word  Ruenc> 

1.108  (0  133) 

.520 

.774 

.735 

.682 

*  Calculated  is  the  proportion  of  residual  variance  (esumatedl  to  total  variance  (estimated),  I  -  (residual  variance)  ■  the  communahty 

*  Standard  errors  in  parentheses. 

‘  Fined  parameter 


longitudinal  decreases  tn  residual  variances  for  all  variables,  sug¬ 
gesting  that  the  communaltties  of  the  primary  ability  variables 
determined  bv  g  increase  over  time.  The  high  degree  of  stability 
in  individual  differences  is  refiected  in  the  high  factor  covariances, 
which  are  provided  in  Table  4.  Standardized,  these  covariances 
reflect  correlations  of  greater  than  .9  between  g  at  each  longi¬ 
tudinal  occasion  Gearly.  there  is  not  much  change  in  the  relative 
ordering  of  individuals  on  general  intelligence  over  the  14-year 
period 

The  results  of  this  model  were  successfully  cross  validated  in 
Sample  2  Rather  than  report  these  results,  we  move  immediately 
to  discussion  of  the  muluple  group  analysis 

Multiple  Group  Analysis 

The  analvses  in  Samples  I  and  2  suggest  almost  perfect  stability 
of  individual  differences  in  intelligence,  both  at  the  g  factor  and 
test-specific  component  levels.  These  analyses  combined  indi¬ 
viduals  spanning  the  adult  life  span,  however,  and  it  was  possible 
that  the  wide  age  range  served  to  maximize  the  apparent  stability 
of  individual  differences  In  particular,  it  was  possible  that  dif¬ 
ferential  change  in  the  late-raiddle-age/old-age  ranges  was  ob¬ 
scured  by  the  high  degree  of  stability  across  most  of  the  adult 
life  span  The  muluple  group  analyses  were  designed  to  examine 
me  stability  of  individual  differences  in  more  homogeneous  age 
ranges  They  also  afforded  us  the  opportunity  of  loolung  at  age 
group  differences  in  the  factor  analysis  parameters  One  might 
expect  that  there  would  be  a  greater  opportunity  for  age  group 
differences  in  factor  loadings — given  the  age  ranges  spanned  by 
our  groups — than  for  longitudinal  age  changes 

We  began  by  testing  the  equality  of  the  observed  covariance 
matrices  across  the  three  age  groups  Box's  test  suggested  non- 


Table  4 

Factor  Covariance  Matrix  (and  Correlations )  for  the 
Longitudinal  Factor  Mode I  (0») 


Factor 

t< 

tt 

g> 

g. 

28  624  (4  137) 

0  945 

0  9  r 

t: 

27723  (3  983) 

30.062  (4.338) 

097: 

g> 

31  776  (4  531) 

34.528  (4.787) 

41.938  (5  728) 

Awe  g,  is  the  general  factor  at  Time  1,  g,  u  the  general  factor  at  Time 
2.  g,  is  the  genera]  factor  at  Time  3  Standard  errors  in  parentheaea 
Values  above  the  diagonal  art  standardized  factor  correlations 


homogeneous  covariance  matrices.  M  *  402.77.  F(240.  at)  = 
1 .59.  p  <  .000 1 .  This  result  made  it  likely  that  there  indeed  were 
group  differences  m  some  of  the  factor  analytic  parameters 
The  longitudinal  factor  model  investigated  in  Sample  1  w-as 
used  in  the  multiple  group  analyses  However,  rather  than  pre¬ 
sume  the  equivalence  of  residual  covariances  (as  m  Model  0t 
above)  we  chose  to  begin  with  these  parameters  unconstrained 
Our  rationale  was  that  group  differences  in  the  residual  covari¬ 
ance  structure  might  have  been  obscured  in  the  single  sample 
analysis  R»ihrr  than  presume  the  constraints,  we  chose  to  eval- 
uate  them  anew  in  the  multiple  group  model 

Our  basic  model,  then,  posited  the  specificauon  of  Model  0. 
of  the  Sample  I  analyses  an  occasion-specific  g  factor  (with  no 
longitudinal  constraints  on  the  factor  loadings),  a  freely  estimated 
factor  covariance  matrix,  and  a  residual  covariance  matrix  with 
free  unique  variances  and  autoeorrelaied  residual  covariances 
This  model  was  specified  in  each  of  the  three  age  groups,  with 
no  additional  constraints  on  the  parameters  across  the  groups 
The  model  was  therefore  equivalent  to  running  the  longitudinal 
factor  model  separately  in  the  three  groups 

As  can  be  seen  from  the  first  entry  in  Table  5.  this  mode! 
denoted  M,.  provided  a  relatively  good  fit  to  the  data,  allowing 
us  to  conclude  that  it  was  a  reasonable  representation  of  the 
covariance  matrices  in  each  group  W'e  therefore  proceeded  to 
test  fer  invariance  in  the  g  factor  loadings  Separate  icsts  of  l he 
equality  of  the  factor  loadings  across  age  groups  (Model  M:)and 
longitudinally  across  occasions  (Model  Mj)  did  not  fit  worse  than 
the  model  with  no  constraints  on  the  factor  loadings  (see  Table 
5).  For  both  tests,  the  combined  change  in  chi-square  »as  actually 
just  less  than  the  change  in  degrees  of  freedom.  x:i32. 
A'  »  412)  ■  29.82.  ns.  We  therefore  concluded  that  the  g  factor 
loadings  demonstrated  complete  age  equivalence — being  invari¬ 
ant  both  longitudinally  and  between  age  groups 

Our  next  set  of  models  examined  invariance  in  the  factor  co- 
variance  matrix  Model  Mi .  requiring  age  group  equivalence  in 
the  factor  covariances  matrix  (both  variances  and  covariances), 
significantly  degraded  the  fit  to  the  data,  requiring  rejection  of 
the  null  hypothesis  of  age  group  equivalence.  We  next  tested  a 
less  restrictive  model,  positing  group  equivalence  in  factor  vari¬ 
ances  but  not  in  covariances  This  model  (M«)  was  also  rejected 
Finally.  Model  M*.  placing  no  group  constraints  on  the  variances 
but  posiung  longitudinal  equality  of  vananc  s  within  each  group, 
was  also  rejected  by  the  data  (see  Table  5)  We  should  note  that 
none  of  these  models  greatly  degraded  the  fit.  as  judged  by  the 
normed  fit  mdex  change  of  .01  or  less  (see  Bentler  &  Bonett 
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Table  5 


Goodness-of-Fu  Statistics  for  Models  H  tlh  Multiple  Groups 


Model 

x: 

df 

P 

P* 

Comparison 

Ax1 

Idf 

P 

^P 

M,  (all  free)* 

25?  85 

216 

.027 

.951 

_ 

_ 

_ 

_ 

_ 

M;(.\,  =  0 

284  24 

240 

.026 

.946 

Mj-Mi 

26.39 

24 

005 

Mil  A,  =") 

28"  88 

248 

.042 

.945 

M)-M) 

3.44 

8 

.001 

329  65 

260 

.002 

937 

M^-M, 

41  97 

12 

<  01 

908 

M,(var$,  =f) 

310.68 

254 

.004 

.941 

Mj-M) 

23  00 

6 

<  .0! 

iMUJ 

M.(var«,  ■*) 

301  28 

254 

.022 

.943 

M«-M) 

14  00 

6 

<  .05 

.002 

MJ0,  =)* 

438  83 

308 

.000 

.913 

Mt-M) 

171  17 

60 

<  001 

0)2 

MgccwF), 

331.77 

278 

.015 

.937 

M.-M, 

44  09 

30 

A 

O 

'-A 

0(18 

1  Bemlci-Bonei!  normed  hi  index 

*  Indicates  no  between-groups  equality  constraints  among  parameters. 

'  Indicates  factor  loadings  constrained  equal  between  groups 

‘  Indicates  factor  loadings  constrained  equal  between  groups  (as  in  M:)  a nd  constrained  equal  over  time  (this  specification  maintained  in  Models 

'  Indicates  factor  covariance  matrices  constrained  equal  between  groups. 

'  Indicates  (actor  variances  constrained  equal  over  groups 

*  Indicates  factor  variances  constrained  equal  over  time  in  each  of  the  groups 

*  Indicates  entire  residual  covariance  matrix  constrained  equal  over  groups. 

'  Indicates  residual  covariances  for  test-specific  components  constrained  equal  over  time. 


19801  Nevertheless,  the  ioss  of  fit.  judged  from  the  likelihood 
ratio  chi-square  test,  was  significant  These  results  indicated  that 
the  factor  covariance  matrices  should  neither  be  taken  to  be  sta¬ 
tionary  over  time  nor  equivalent  across  age  groups 

Finally,  we  pursued  the  residual  covariance  structure  to  assess 
the  stability  of  the  residual  variances  and  covariances  across  time. 
A  preliminary  model.  M-,  specified  group  invariance  in  all  three 
parameter  matrices  l  A.  <t>.  and  0).  Compared  to  model  M3,  this 
model  tests  the  age  group  equivalence  of  the  residual  covariance 
matrix  The  hypothesis  was  convincingly  rejected.  Our  next  step 
was  to  evaluate  the  plausibility  of  a  model  constraining  the  re¬ 
sidual  covariances  to  be  equal  between  different  measurement 
occasions  (as  was  the  case  for  Model  0<,  in  the  single  sample 
analysts).  Model  M»  placed  these  constraints  on  the  residuals. 
The  loss  of  fit  was  marginally  significant  at  the  95%  confidence 
level  We  concluded  that  the  model  specifying  equal  covariances 
had  mtsst  1  the  mark,  hut  not  bv  much.  Thus,  unlike  Model  0*. 
we  could  not  treat  the  residual  covariances  as  invariant  over 
longitudinal  occasions  in  the  multiple  group  analysis.  Apparently, 
both  the  residual  variances  and  covariances  differed  by  group 
and  over  longitudinal  occasions,  although  the  loss  of  fit  due  to 
group  constraints  was  clearly  much  greater  than  the  loss  due  to 
fitting  invariant  residual  covariances  over  longitudinal  occasions 
in  each  of  the  groups  separately. 

An  alternative  method  for  approaching  stability  in  the  residual 
covariances  is  by  specification  of  a  model  positing  both  occasion- 
specific  and  test-specific  factors  (e  g..  Joreskog  &  Sorbom,  1977). 
Figure  2  depicts  the  factor  pattern  matrix  (A)  associated  with  a 
combined  occasion-specific  and  test-specific  factor  model  for 
these  data.  A  given  variable  loads  both  on  the  general  factor  and 
its  own  test-specific  factor  (i.e.,  a  Verbal  Meaning  factor,  a  Space 
factor,  and  so  onl.  This  parameterization  of  the  residual  covari¬ 
ances  is  plausible  if  one  argues  for  a  special  relation  among  the 
residuals  over  time — a  first  order  autoregressive  structure  (see 
Jbreskog  <k  Sorbom.  1977)  Addition  of  test-specific  factors  places 
no  additional  restrictions  on  the  residual  covariances,  given  that 
there  are  only  three  occasions  of  measurement  (with  more  oc¬ 


casions.  specification  that  the  residual  covariances  form  a  single 
common  factor  may  not  fit  the  residual  covariance  siruclurei 
The  advantage  of  the  test-specific  factor  representation  is  that  it 
enables  one  to  separately  estimate  components  of  variance  as¬ 
sociated  with  g.  stable  variance  in  the  primary  ability,  and  a 
residua)  consisting  of  unstable  variance  plus  measurement  crro- 
(see  Fleruog.  in  press). 

We  reestimaled  model  Mj  (invariant  factor  loadings  onlyi  with 
test-specific  factors.  The  parameter  estimates  and  standard  errors 
are  provided  in  Tables  6  and  7  Given  the  fact  that  the  hypothesis 
of  invariant  g  factor  loadings  had  been  found  plausible,  we  were 
entitled  to  assume  measurement  equivalence  and  io  evaluate  the 
remaining  parameter  estimates  with  respect  to  the  issue  of  sta¬ 
bility  and  change  in  intelligence  Several  points  of  interest  re¬ 
garding  the  stability  of  individual  differences  emerged  First,  the 
factor  covariances  were  again  extremely  high,  indicating  a  great 
degree  of  stability  in  individual  differences  in  g  over  the  i 4-vear 
interval  for  all  three  age  groups.  Standardized,  these  factor  cor¬ 
relations  are  approximately  .9  (or  greater)  for  all  groups  (see 
Table  7). 

Table  8  summarizes  the  stability  of  individual  differences  by 
reporting  the  correlations,  r1,  and  the  estimated  autoregressive 
coefficients  predicting  g  from  the  previous  longitudinal  occasion 
As  can  be  seen  from  Table  9.  the  r1  is  larger  for  g2  to  g3  in  all 
groups,  accounting  for  92%  of  the  variance  in  g3  in  both  the 
middle-aged  and  old  groups  The  predominance  of  stability  is 
underscored  by  the  regression  coefficients  reported  in  Table  9 
As  suggested  by  Kessler  and  Greenberg  ( 1 98 1 ),  we  have  expressed 
the  raw-score  slope  coefficients  in  terms  of  the  stability  and.  as 
given  in  the  last  column  of  Table  9.  the  regression  of  the  change 
scores  on  initial  scores  (e  g,  the  regression  of  g,-g2  on  g,).  This 
latter  coefficient,  if  negative,  suggests  regression  to  the  mean;  if 
positive,  it  suggests  increasing  differences  between  individuals 
that  covary  with  initial  differences.  Table  6  shows  that  the  raw- 
score  slopes  were  very  near  1.0  (suggesting  high  stability)  and 
that  the  change  slopes  were  near  zero  (suggesting  little  change 
variance  predictable  from  initial  scores).  In  both  the  nuddle- 
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FACTORS 


V, 
S, 
R, 
N> 

W, 


V, 


V, 

R> 

W, 


8,  8,  9j 

A,  O  O 

As  O  O 

1  O  O 

A,  O  O 

A4  O  O 

O  A,  O 

O  A,  O 

O  1  O 

O  A,  O 

o  a4  o 

O  U  A, 

O  O  A, 

O  O  1 

O  O  A, 

O  O  A4 


V  3  p  N 

A,  O  O  O  O 

O  A7  O  O  O 

O  O  A4  O  O 

O  O  O  A„  O 

O  O  O  O  A„ 

1  O  O  O  O 

O  1  O  O  O 

O  O  1  o  o 

O  O  O  1  o 

O  O  O  O  1 

A*  O  O  O  O 

O  A,  O  O  O 

O  o  Aro  O  O 

O  O  O  A„  O 

O  O  O  O  V14 


hfu'f  2  Factor  pattern  matm  for  model  including  occasion  specific  and  test-specific  lactors 
'O  s  and  I'j  are  fixed  parameters.  X's  are  estimated  by  the  model  ) 


aged  and  old  groups,  the  change  slopes  were  slightly  negative  for 
g.  and  g:.  suggesting  slight  regression  to  the  mean,  and  slightly 
positive  from  g:  to  ?j.  suggesting  some  egression  from  the  mean 
(the  rich  getting  richer,  the  poor  poorer,  as  it  were).  In  the  young 
group,  the  stabilities  were  lower,  albeit  still  impressively  large, 
and  the  regression  to  the  mean  was  consistent  across  Ume  inter¬ 
vals 

The  patterns  of  stability  and  change  identified  in  the  regression 
coefficients  were  mirrored  in  the  factor  variances,  which  exhibited 
different  patterns  of  change  across  each  of  the  groups  Factor 
variances  decreased  in  the  young  group,  but  showed  reliable  in¬ 
creases  from  he  second  to  the  third  occasion  of  measurement 
in  both  the  middle-aged  and  old  groups  This  increase  in  g  vari¬ 
ance  was  consistent  with  the  regression  from  the  mean  suggested 
from  the  regression  coefficients.  The  decreases  in  variance  and 
the  regression  to  the  mean  partem  in  the  young  group  may  reflect 
the  mild  ceiling  effects  on  Verbal  Meaning  and  Reasoning  that 
we  have  observed  in  the  youngest  age  groups  in  the  SL5  longi¬ 
tudinal  samples. 

Third  factor  variances  varied  in  magnitude  between  the  age 
groups  The  older  group  was  generally  more  heierogmeous  (had 
greater  individual  differences  tr.  g)  than  were  the  young  and  mid¬ 
dle-aged  groups  Taken  together,  these  results  suggested  that  al¬ 
though  there  was  significant  stability  of  tnd'vidual  differences  tn 


all  age  grot-ps.  the  old  group  showed  an  interesting  pattern  of 
(a)  greater  variability  in  g  at  initial  measurement  and  (b>  in¬ 
creasing  variability  over  time  ' 

An  alternative  way  of  looking  at  siabi’  is  the  decompxisition 
of  variance  in  the  model  including  hot  x'casion-spiecific  and 
test -specific  factors.  As  can  be  seen  in  Tabu  9.  the  prepxynderance 
of  g  variance  at  the  second  and  thrd  occasions  of  nv-asurement 
is  stable  variance  predicted  by  individual  differences  at  the  prior 
measurement  occasion  Given  that  we  were  studying  the  second- 
order  g  factor,  it  is  relevant  to  ask  about  the  stability  of  the  residual 
compionenis.  reflecting  the  five  primary  ability  factors  from  the 
PM  A  Table  9  repxvrts  the  decomptosition  of  variance  on  each  of 
the  I  5  observed  variables  for  each  group  into  proportions  of  (a) 
g- related  variance,  (b)  stable  test-sptecific  variance,  and  (c)  residual 
variance  The  g-related  variance  components  are  actually  the 
communaliues  of  the  observed  variables  with  respect  to  the  g 


’  One  concern  *e  had  was  that  the  patterns  of  facior  v.  nances  might 
he  due  to  the  different  agr  sp»n  for  the  oldest  group  (se*  "  ble  1 1  Wc 
therefore  reanalyzed  the  data,  using  onh  the  two  oldest  coho,  s  in  Samples 
I  and  2  to  form  i  smaJlr*  oJd  group  The  redc^muon  of  the  oid  grour 
did  not  eliminate  the  hi&l»eT  variances  m  g  for  the  old,  but  did  attenuate 
the  longjtudmaJ  increases  in  variance  This  anaJvsis  is  discussed  tn  more 
detail  tn  the  second  article  in  this  senes  (Hexuog  A  Schue,  1986) 


LONGITUDINAL  COVARIANCE  STRUCTURES 


Table  6 

Faa< >'  LoaJn: 6'r  Model  H'/rfi  Occasion-Specific  (gj  and  Test-Specific  Factors 
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Variable 

«* 

«•* 

Test 

(Voungf 

Test 

(Middle  agedf 

Test 

(Oldf 

V, 

1  659  (098) 

.767 

1.032  (.129) 

0921  (.122) 

0  6501  193) 

s, 

0.948  (.087) 

.438 

1.001  (.084) 

0  908  (.107) 

1.136  (  208) 

R, 

1.000*  — 

.777 

0.752  (.174) 

1.120(151) 

0.708  (.199) 

N, 

1.463  (.106) 

.588 

1.005  (.086) 

0.962  (.058) 

0.935  (.084) 

W, 

1.340  (.118) 

.485 

0.667  1.102) 

1.049  (.102) 

1.046  (.104) 

V, 

1  659(098) 

.767 

1.000*  — 

1.000*  — 

1.000*  — 

S; 

0  948  (.08'’) 

.438 

1.000*  — 

I  000*  — 

1  000*  — 

Ri 

1.000*  - 

.777 

1.000*  — 

1.000*  — 

1  ooo*  — 

N, 

1  463  (.106) 

.588 

1.000*  — 

1.000*  — 

1.000*  — 

W, 

1.340  (.118) 

.485 

1.000*  — 

1.000*  — 

1.000*  — 

V, 

1  659  (  098) 

.767 

0.971  (.120) 

0  820  (.117) 

1  042  (.323) 

S, 

0.948  (.087) 

.438 

0.965  (.089) 

0.770  (.095) 

1  I30(.2ll) 

R, 

i  ooo*  — 

.777 

0.920  (.208) 

1.006  (.133) 

0.740  (  196) 

N, 

1.463  (106) 

.588 

0.970(080) 

0.868  (.053) 

0  786  (  074) 

w, 

1.340  (.118) 

.485 

0.988  (.126) 

0.925  (.086) 

0  928  (.092) 

Sole  Standard  errors  are  in  parentheses  Astenskj  denote  fixed  parameters.  Subscripts  on  variables  indicate  longitudinal  occasion  ( 1  -  Time  1,2“ 
Time  2.  3  =  Time  3)  V  =  Verbal  Meaning;  S  “  Space;  R  -  Reasoning  N  -  Number,  W  =  Word  Fluency 
1  Facior  loadings  for  occasion-specific  general  factor  I g).  Estimates  were  constrained  equal  across  the  3  longitudinal  occasions 
“  Rescaled  general  .'actor  loadings. 

'  Test-specific  factor  loadings  for  each  age  group 


facior  The  variance  associated  with  the  test-specific  factor  rep¬ 
resents  stable  variance  across  occasions  specific  to  the  primary 
ability  The  residual  variance  represents  a  combination  of  mea¬ 
surement  error  variance  and  unslaoie  specific  variance  (the  two 
components  cannot  be  disentangled  in  this  analysis).  There  are 
several  points  of  interest  in  Table  9.  First,  the  communalities  of 
the  g  factor  increased  substantially  in  the  old  group  relative  to 
the  young  and  middle-aged  groups  (and  showed  a  tendency  to 
increase  over  time  longitudinally  as  well).  Thus  g  determines 
more  of  the  variance  of  the  observed  measures  in  the  old  than 
in  the  young  Second,  those  variables  with  the  lowest  communal¬ 
ities  for  g  (Space.  Number.  Word  Fluency)  show  very  high  levels 

T3blc  2 


Factor  Covariance  Matrices  for  Occasion-Specific 
Factors  in  Lath  1  gc  (iroup 


Factor 

g. 

gi 

g} 

Young 

15  048  <2.8681 

0  887 

0  930 

1 1  8<*fc  (2  409) 

1 1  959  (2  421) 

0933 

II  951  (2  365) 

10  6  90  (  2.P9) 

10  970  (2.257) 

Middle  aged 

16  ’97  (2  691) 

0.927 

0  960 

gi 

16  204  (2  549) 

16  761  (2.652) 

0  959 

Mi 

16  786  12  6ni| 

16  760  (2  591) 

18.204  (2  798) 

Old 

23  546  (3  595) 

0  944 

0  885 

Hi 

22  405  (3  427) 

23.94)  (3  7|3) 

0  959 

tts 

23  442  (3  598) 

25.589  (3.814) 

29  769  (4,335) 

Sole  Standard  errors  are  in  parentheses  Values  above  the  diagonal  are 
factor  correlauons  standardized  independently  in  each  age  group 


of  stability  in  the  primary  ability  (test-specific)  domain.  For  ex¬ 
ample,  although  only  about  14%  of  the  young  group's  variance 
of  Space  at  Time  I  is  determined  by  g.  72%  of  Space's  Time  I 
variance  is  determined  by  the  Space  test-specific  factor  in  the 
young  group.  This  indicates  substantial  stability  in  both  the  g 
and  test-specific  domains.  Proportions  of  stable  test-specific 
variance  to  total  ^-adjusted  variance  are  given  in  the  right-hand 
column  of  Table  9.  Considering  that  these  proportions  are  con¬ 
taminated  by  measurement  error,  the  proportion  of  stable  vari¬ 
ance  in  the  primary  ability  measures  independent  of  g  is  indeed 
impressive.  Finally,  the  unique  variances  show  some  evidence  of 
change  in  the  primary  abilities,  but  in  many  cases  the  proportions 
of  unique  variance  are  close  to  what  would  be  expected  to  be 
the  magnitude  of  error  variance,  given  the  reliabilities  of  the 
measures  reported  by  Thurstone  and  Thurstone  (1949) 


Table  8 


Correlations  and  Regression  Coefficients  Indicating  Stability 
of  Individual  Differences  in  g 


Group 

r • 

r’ 

l-r! 

t'u’ 

Young 

8i.& 

.887 

.787 

.213 

0  791 

-0  209 

k-  k 

.933 

.870 

.130 

0894 

-0  106 

Middle  aged 

ti.  k 

.927 

859 

.141 

0  965 

-0035 

k.  k 

.959 

.920 

.080 

1  000 

oooo 

Old 

l>.  k 

.944 

,891 

109 

0  952 

-0  048 

k.  k 

.959 

.920 

.080 

1  069 

0  069 

Sole  Stabilities  are  shown  for  7-year  intervals  between  adjacent  longi¬ 
tudinal  occasions. 

*  Simple  corTetiuoo  of  scores  for  adjacent  occasions 

‘Simple  regression  of  later  occasion  on  eartie  occasion  (unsundardized) 

*  Regression  of  change  score  on  earlier  occasion  luimandardized) 
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Table  9 

Estimated  Variance  Components  From  Final 


Multiple  Croup <  Model 


Variable 

g‘ 

Tesi-specific* 

Udkjuc' 

Stable  (testy* 

Young 

V, 

76  :»6 

.543 

.333 

.124 

.729 

s, 

98  688 

13’ 

.724 

139 

.839 

R, 

27  518 

54" 

.186 

.268 

,4|0 

N 

1  14  10" 

282 

.591 

.12’ 

.823 

W, 

136  09’ 

19Q 

.332 

4’0 

.4  14 

Vj 

62  008 

531 

.385 

.084 

.821 

s 

103 

104 

.689 

.208 

.768 

R- 

2’  832 

430 

.325 

.246 

.569 

N. 

1 1 5  82$ 

.221 

.576 

.203 

.739 

W, 

148  66" 

.  i  44 

.683 

.171 

.798 

Vs 

63  58’ 

4"5 

.354 

Pi 

.674 

s, 

104  8’2 

rw4 

.634 

.274 

.698 

R, 

:4  025 

45“ 

.318 

.225 

.586 

N, 

9t  :*! 

.244 

.652 

.104 

.862 

W  j 

159  8*9 

123 

620 

.25' 

.713 

Middle  aged 

\ 

--  ;r.y 

50C 

.273 

128 

.681 

s 

0  -  **  t  vj 

153 

468 

36« 

.559 

R 

?:  4 "  i 

<  !  " 

.299 

184 

.44$ 

S 

i:o  38" 

299 

.58° 

.112 

840 

v. , 

154  **! 

195 

.502 

.304 

.623 

\ . 

81  845 

564 

.304 

.133 

.696 

s 

8:  420 

183 

.680 

.13' 

832 

p 

f"6 

5“h 

266 

15' 

.629 

V 

; 2"  9Q- 

.2*" 

.599 

1 2  i 

.832 

V.  . 

i:< 

24*. 

564 

196 

.  1 42 

\ . 

8  ’  > 

6" 

.206 

1  '8 

.536 

s 

1  *  i 

.38" 

.422 

4“8 

R, 

V  5a-  i 

55a- 

256 

148 

.634 

N, 

lt~  3r  1. 

3<» 

528 

116 

820 

V*  V 

■  ■  C  <  :  • 

2“  • 

505 

.221 

69; 

Old 

\ 

k  :  i*m 

6  34 

.08’ 

.’’8 

.238 

8  "•  "84 

2 c ' 

.348 

400 

46  5 

R 

u  yi 

68  5 

105 

.210 

33? 

N 

l  1  o  A-Wf- 

42 ; 

44  1 

.138 

.’62 

W 

j  (v  \ 

.516 

.226 

.695 

\  ; 

1  .  4  04 

_113 

184 

.243 

4V 

5- 

428 

2’8 

.292 

.431 

404 

R- 

v  no 5 

66  5 

.201 

.134 

.600 

N- 

129  34“ 

.396 

466 

138 

112 

V.  . 

’8’ 

281 

.505 

.214 

.702 

S  , 

12*  "24 

64“ 

.182 

.r: 

.514 

S, 

'4  625 

356 

.385 

.258 

.599 

R. 

38  or 

'83 

.104 

.113 

479 

N 

I J9  2i  I 

.534 

.313 

.15? 

6'2 

v., 

151  523 

353 

438 

.208 

.678 

A  we  r3  -  esumated  variance  of  observed  variable  V  -  verbal  meaning 
S  -  space  R  ■  reasoning  N  -  number.  W  -  word  fluency  Subscripts 
on  variables  indicate  longitudinal  occasion  (I  -  Time  I,  2  -  Time  2. 
3  3  Time  3i 

'  Rrororiion  of  variance  due  10  ft 
'  Proportion  of  variance  due  to  lest -specific  factor 

*  Proportion  of  variance  unique  to  the  observed  variable  The  sum  of  the 
three  proportions  (/-related,  test -specific,  unique  I  u  I  0 

*  Proportion  of  variance  s«  determined  by  /  that  is  determined  by  the 
las  specific  factor 


Discussion 

The  results  of  the  present  study  present  a  relatively  coherent 
picture — one  of  measurement  equtsuler.ee  and  stability  m  psy¬ 
chometric  intelligence,  as  measured  by  the  Thurstones  s  1948 
Primary  Mental  Abilities  test,  in  adulthood  We  found  that  it 
was  highly  plausible  to  model  the  factor  loadings  of  a  general 
intelligence  factor  as  being  invariant,  both  longitudinally  and 
across  multiple  age  groups  We  also  found  a  high  degree  of  sta¬ 
bility  of  individual  differences  across  the  adult  life  span 

The  finding  of  invariance  in  the  g  factor  loadings  is  important 
relative  to  the  suggestion  in  the  literature  that  the  fundamental 
measurement  properties  of  the  psychometric  tests  change  over 
the  life  span  (e  g,  Baites  A  Nesselroade,  1970,  Demming  A  Pres¬ 
ses  1957;  Schaie,  1977).  As  shown  by  Meredith  (1964),  under 
selection  of  subpopulations  rrom  a  population  for  which  an  iso¬ 
morphic  common  factor  model  holds,  the  multiple  subpopula- 
tions  will  have  an  invariant  unstandardized  factor  pattern  matrix 
Meredith's  work  implies  that  one  must  reject  the  hypothesis  of 
metric  invariance  before  one  is  justified  in  concluding  that  the 
groups  have  qualitatively  different  factor  structures  One  cannot 
argue  for  qualitative  group  differences  in  measurement  properties 
tf  the  hypothesis  of  metric  invariance  cannot  be  rejected  In  con¬ 
trast.  we  found  the  hypothesis  of  metric  invariance  to  be  strongly 
supported  by  our  data  Our  results  therefore  suggest  that,  what¬ 
ever  the  faults  inherent  in  the  constructs  of  psychometric  intel¬ 
ligence.  measures  of  psychometnc  intelligence  seem  to  be  mea¬ 
suring  basically  isomorphic  constructs  with  similar  measurement 
properties  at  different  age  levels 

One  could  still,  of  course,  argue  that  the  conslructs  measured 
by  psychometric  intelligence  are  of  limned  utility  in  predicting 
inielligem  behaviors  in  adults  (e  g  .  Siemberg.  1 985 )  Neverthe¬ 
less  our  findings  do  not  support  the  nouon  that  psychometric 
testing  of  abilities  in  older  papulations  is  invalid  because  one  is 
measuring  qualitatively  different  constructs  with  unsuble  mea¬ 
sures  Our  conclusion  must  be  qualified  by  the  fact  that  our 
assessment  of  factorial  invariance  is  specific  to  the  second-order 
g  factor  We  cannot  assess  the  invariance  of  ihe  primary  abilitv 
factor  loadings  from  our  data  We  therefore  cannot  rule  out  the 
possibility  of  nonequivalent  measurement  properties  at  the  pri¬ 
mary  ability  level,  although,  given  ihe  stability  indicaied  hi  the 
test-specific  factors,  the  likelihood  of  measurement  equivalence 
in  the  primary  ability  factors  seems  quite  high  Data  we  recently 
collected  on  an  expanded  ability  battery  as  part  of  the  1984  SLS 
assessment  should  help  us  address  the  measurement  equivalence 
issue  at  the  primary  ability  factor  level 

The  finding  of  factorial  invariance  is  relevant  to  the  factor 
analytic  literature  suggesting  de-differentiation  of  ability  factors 
in  old  age  (Remeru  19701  The  de-differentiation  argument  states 
that  ability  factors  coalesce,  or  collapse,  toward  a  general  intel¬ 
ligence  factor  in  older  groups  The  early  lueraiure  on  this  phe¬ 
nomenon  was  plagued  by  methodological  inadequacies  (Cun¬ 
ningham.  1978,  Reinerl.  1 970.  Schaie  A  Hcrtzog.  1985)  Recent 
comparative  factor  analysis  work  by  Cunningham  ( 1980.  1981). 
using  confirmatory  factor  analysis  methods,  suggests  that  then 
is  bttle  evidence  for  gross  collapse  of  the  factor  space— the  same 
number  of  factors  are  needed  lo  model  ability  variables  in  old 
groups  and  the  loading  patterns  art  highly  similar  Our  results 
are  consistent  with  Cunningham's  findings  in  suggesting  mvan- 
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ance  in  the  raw-score  regressions  of  variables  on  ability  factors, 
both  across  age  groups  and  longitudinally  within  age  groups  (see 
also  Cunningham  St  Birren,  1980). 

Cunningham  ( 1 980.  1981)  reported  evidence  for  a  mild  form 
of  de-differentiation — that  is,  increased  factor  correlations  in  the 
older  groups  Our  finding  of  increased  communalities  for  g  in 
the  old  group  is  also  consistent  with  this  mild  form  of  de-differ- 
entiation  To  clarify  the  relation,  we  report  in  Table  10  corre¬ 
lations  among  the  primary  abilities  obtained  by  a  confirmatory 
factor  analysis  specifying  test-specific  factors.  As  can  be  seen  in 
Table  10.  there  is  a  pronounced  tendency  for  factor  correlations 
to  be  higher  in  the  old  group.  Crude  indexes  of  this  tendency 
are  the  average  correlations  of  .36  for  the  young  group,  .39  for 
the  middle-aged  group,  and  .54  for  the  old  group.  Nevertheless, 
it  must  be  emphasized  that  the  primary  thrust  of  the  de-differ¬ 
entiation  argument — qualitative  change  in  the  nature  of  ability 
factors — is  neither  supported  by  Cunningham’s  findings  nor  by 
our  own 

The  age-related  measurement  equivalence  in  the  PMA  allows 
us  to  make  unambiguous  interpretation  of  the  stability  of  indi¬ 
vidual  differences  in  g  over  time.  Clearly,  individual  differences 
in  general  intelligence  are  highly  stable  across  14- year  longitudinal 
epochs  for  three  age  groups  (spanning  most  of  the  adult  age 
range)  The  stability  coefficients  indicated  that  approximately 
90^  of  the  c  variance  in  the  middle-aged  and  old  groups  was 
consistent  between  adjacent  7-year  testing  intervals.  There  is, 
then,  littie  indication  in  these  data  of  any  substantial  degree  of 
variability  it:  developmental  trajectories  in  g.  Moreover,  the  sta¬ 
bility  of  individual  differences  in  the  PMA  ability-specific  com¬ 
ponents  m  our  longitudinal  model  suggest  a  high  degree  of  sta¬ 
bility  in  individual  differences  on  the  primary  abilities  as  well. 

Although  these  results  clearly  limit  the  degree  to  which  one 
could  argue  for  a  substantial  degree  of  interindividual  differences 
in  intraindividual  change  in  psychometric  intelligence  in  adult¬ 
hood.  it  would  be  overstating  the  case  to  argue  that  these  data 
demonstrate  a  lack  of  variability  in  change  functions  across  the 
adult  life  span  For  one  thing,  it  is  well-known  that  the  longitu¬ 
dinal  samples  of  the  SLS  are  influenced  by  a  substantial  degree 
of  experimental  mortality  (Schaie.  Labouvie.  &  Barrett,  1973), 
causing  the  participants  in  the  14-year  studies  to  be  relatively 
select  wnh  respect  to  ability  levels  It  is  highly  likely,  given  the 
relatively  long  7-year  retest  interval  and  the  nature  of  the  sampling 
procedures,  that  individuals  in  terminal  decline  or  suffering  dif¬ 
ferential  loss  of  abilities  due  to  severe  illness  will  have  dropped 
out  of  the  longitudinal  sample  (Hertzog,  Schaie,  &  Gribbin, 
1978)  The  high  degree  of  stability  we  observed  in  this  study 
may  be  specific  lo  more  select  healthy  subpopulations  of  adults 
and  may  not  generalize  to  the  population  at  large.  Moreover,  our 
sample  size  was  sufficieitly  small  that  we  were  forced  lo  pool 
over  relatively  large  age  ranges  lo  form  our  age  groups.  Such  a 
procedure  maximizes  individual  differences  at  the  initial  mea¬ 
surement  occasion  and  may  have  obscured  some  degree  of  het¬ 
erogeneity  in  developmental  trends.  We  note,  however,  that  the 
estimates  of  stability  did  not  differ  greatly  between  the  Sample 
I  analysis  and  the  age-partitioned  multiple  group  analysis  that 
reduced  individual  differences  produced  by  wide  age  spans. 

Of  course,  as  McCall  ( 1 98 1 )  pointed  out,  even  stabilities  of  .9 
allow  for  a  greater  degree  of  crossover  of  individual  curves  than 
might  be  expected  by  social  scientists.  At  the  md-vidual  leveL,  it 


Table  10 

Primary  Ability  Factor  Correlation;  for  the  Three  Age  Group t 


Verbal 

Word 

Var.aWe 

Meaning  Space  Reasoning 

Number 

Fluency 

Young  (M  age  -  37) 

Verbal  Meaning 
Space 

.115 

l 

Reasoning 

.559 

.455  1 

Number 

.390 

.239  .489 

J 

Word  Fluency 

.531 

034  .425 

.334 

i 

Middle  aged  (M  age  -  49) 

Verbal  Meaning 
Space 

1 

.296 

1 

Reasoning 

.711 

479  1 

Number 

.419 

.248  .441 

1 

Word  Fluency 

.508 

.039  439 

308 

1 

Old  (M  age  -  65) 

Verbal  Meaning 
Space 

1 

.593 

I 

Reasoning 

.838 

.650  1 

Number 

.666 

.528  .62’ 

1 

Word  Fluency 

.557 

.290  .202 

450 

1 

is  still  possible  that  a  given  individual  will  buck  the  tide,  and 
exhibit  less  change  in  g  than  his  or  her  same-age  peers  There 
may  also  be  more  variability  in  the  primary  abilities  than  in  the 
higher  order  intelligence  factor  One  can  sec  in  Table  7  that  the 
test-specific  stabilities  were  in  some  cases  smaller  than  the  sta¬ 
bilities  for  g  in  the  same  age  interval  In  the  old  group,  for  ex¬ 
ample,  the  stability  of  the  Space  test-specific  factor  seems  to  be 
smaller  than  the  stability  observed  for  Space  in  the  young  and 
middle  aged,  even  though  the  stability  of  individual  differences 
in  g  is.  if  anything,  greater  in  the  old  group  This  resull  may 
indicate  slightly  more  variability  in  the  patterns  for  the  Spatial 
Orientation  ability  tapped  by  the  Space  test  (see  McGee,  1979). 
These  data  are  not  optimally  suited  for  assessing  primary  ability- 
specific  change,  however,  because  unreliability  due  to  measure¬ 
ment  error  cannot  be  separated  from  instability  in  the  ability  in 
the  analysis  we  have  reported.  In  any  case,  we  must  be  careful 
lo  emphasize  that  there  is  considerably  more  consistency  than 
inconsistency  in  age  changes  in  all  age  groups,  and  for  all  PMA 
subtesls.  Finally,  we  can'  jt  the  possibility  that  individual 

differences  in  cha'--  >at  matter,  changes  in  factor  load¬ 

ings),  occur  in  oloti  (Dcyond  80)  not  represented  in  this 
study 

The  invariance  in  tin.  PMA  g  factor  loadings  and  the  stability 
of  individual  differences  in  intelligence  contrasts  sharply  with 
patterns  of  mean  age  changes  found  in  the  SLS  (e.g.,  Schaie. 
1983;  Schaie  &  Hertzog.  1983)  Schaie  has  consistently  found 
variation  in  mean  patterns  according  to  age,  cohort,  and  time 
of  measurement.  Moreover,  these  mean  changes  have  been  found 
to  vary  in  magnitude  for  different  abilities.  The  difference  in 
findings  underscores  the  critical  distinction  between  stability  in 
means  (i.e.,  on  average,  no  age  changes)  and  stability  of  individual 
differences.  In  normally  distributed  variables,  stability  of  the 
means  and  stability  of  individual  differences  (as  measured  by 
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covariances)  arc  statistically  (and  conceptually)  independent.  As 
one  can  see  in  the  next  article  in  this  senes  (Heruog  St  Schaie, 
1986),  we  can  observe  stability  of  individual  differences  either 
when  there  are  no  mean  age  changes  or  when  there  are  substantia! 
mean  changes  over  a  given  portion  of  tne  tile  span. 
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We  analyzed  data  on  psychometric  intelligence  from  the  Seattle  Longitudinal  Study,  simultaneously 
estimating  longitudinal  factors,  their  covariarice  structure,  and  their  mean  levels.  Data  on  five  Thur- 
sione  Primary  Mental  Abilities  subtests  acre  available  for  412  adults,  ages  22-70  at  first  test,  who 
were  tested  three  times  at  7-year  intervals.  A  previous  longitudinal  factor  analysis  had  shown  high 
stability  of  individual  differences  (covariance  stability)  in  general  intelligence  for  three  adult  age 
groups  We  extended  that  model  to  estimate  factor  means.  AU  three  age  groups  showed  high  levels 
of  covariance  stability,  but  differed  sharply  in  their  mean  profiles.  The  young  group  showed  increas¬ 
ing  levels  of  genera)  intelligence,  the  middle-aged  group  had  stable  levels  of  intelligence,  and  the  old 
group  showed  salient,  approximately  linear,  decline  The  patterns  of  stability  in  middle-age.  followed 
by  mean  decline  and  high  covariance  stability  in  old  age,  suggest  a  normative  developmental  transi¬ 
tion  from  a  stability  pattern  to  a  decline  pattern  of  general  intelligence,  with  the  inflection  point 
occunng  somewhere  around  age  60. 


An  important  issue  in  the  study  of  adult  intellectual  develop¬ 
ment  concerns  w  hethe r  levels  of  intelligence  remain  stable  with 
advancing  age  There  is  general  agreement  that  the  average  level 
of  performance  on  certain  psychometric  measures  of  intelli¬ 
gence  declines  with  age.  although  there  is  great  debate  as  to  (a) 
the  ubiquity  of  decline,  fb)  the  proper  interpretation  of  decline 
in  psychometric  performance  when  it  occurs,  and  (c)  the  prac- 
ucal  importance  of  the  magnitude  of  age-related  decline  (e.g.. 
Baltes.  Dium.in-k.ohh.  &  Dixon.  1984;  Botwmick,  1977; 
Dixon  Kram"'  &  Bakes.  1985;  Horn,  1985;  Horn  &  Donald¬ 
son.  1976.  1980.  Schaie.  198?).  At  the  center  of  the  disagree¬ 
ments  in  the  literature  regarding  aging  and  intelligence  has  been 
Schaie's  longitudinal  studies  of  aging  and  primary  mental  abili¬ 
ties  (see  Schaie.  1983)  The  debate  between  Horn,  Schaie,  and 
others  (e  g  .  Bakes  &  Schaie.  1976,  Horn  St  Donaldson,  1976) 
covered  a  large  number  of  issues  associated  with  Schaie's  se¬ 
quential  design,  psychometric  tests,  and  alternate  theories  and 
interpetations  of  aging  and  intelligence.  Subsequent  work  by 
Schaie  and  Hertzog  (19831  re-examined  the  issues  with  new 
data  from  Schaie's  sequential  samples  Their  cohort-sequential 
analyses  identified  clear  cohort  differences  in  certain  psycho- 


We  report  data  collected  as  pan  of  the  Seattle  Longitudinal  Study, 
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Correspondence  concerning  this  article  should  be  addressed  toChnv 
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metric  tests  and  identified  statistically  significant  changes  in 
multiple  psychometrically  defined  abilities  For  all  five  subtests 
of  Thurstone's  Primary  Mental  Abilities  (PM A.  Thurstonc.  A 
Thurstone,  1949),  declines  in  performance  (whether  measured 
by  longitudinal  or  cross-sectional  sequences)  were  negligible 
until  after  age  50.  Declines  that  were  observed  after  age  50  were 
small,  but  became  increasingly  large  after  mean  age  60  A  some¬ 
what  surprising  result,  given  earlier  cross-sequential  results 
from  Schaie's  data,  was  that  the  longitudinal  sequences  sug¬ 
gested  decline  after  mean  age  60  in  all  PMA  subtests,  although 
the  decline  began  later  for  the  PMA  subtest  Verbal  Meaning  (a 
test  of  recognition  vocabulary  )  Schaie  and  Hertzog  (1983)  ar¬ 
gued  that  these  results  required  some  minor  modification  of 
previous  positions  regarding  the  age  of  onset  of  intellectual  de¬ 
cline,  but  that  they  supported  the  major  conclusions  of  (ai  age- 
confounded  cohort  differences  in  cross-sectional  studies,  (bl  rel¬ 
ative  stability  of  mean  performance  levels  into  the  50s.  with 
substantial  declines  only  after  age  60.  and  (c)  some  differences 
across  subtests  in  the  onset  and  magnitude  of  age-related  perfor¬ 
mance  declines  (see  also  Dixon  etal.,  1985). 

Although  most  of  the  gerontological  literature  has  focused  on 
the  issue  of  stability  of  mean  levels  of  intelligence  with  aging. 
mean  stability  is  but  one  type  of  stability  that  can  be  assessed 
in  longitudinal  data.  Another  important  type  of  stability  is  sta¬ 
bility  of  individual  differences  (e.g.,  Baltes,  Reese.  A  Nessel- 
roade,  1977;  Kagan,  1980;  Schaie  &  Hertzog.  1985)  This  sta¬ 
bility  reflects  the  degree  to  which  individuals  differ  in  their  de¬ 
velopmental  patterns  of  change  (Baltes  et  al.,  1977;  Ncsselroade 
St  Labouvie,  1985;  Schaie  St  Hertzog,  1985).  Whereas  stability 
of  means  is  reflected  in  equivalent  mean  values  at  different  de¬ 
velopmental  times,  stability  of  individual  differences  is  reflected 
in  the  covariance  of  a  variable  with  itself  over  two  points  in  time 
(see  Baltes  et  al  1977).  In  this  article,  we  refer  to  stability  of 
individual  differences  as  covariance  stability  (see  Hertzog  St 
Nesselroade,  1987). 
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in  a  previous  article,  Hertzog  and  Schaie  (1986)  demon- 
str’"  '  at  there  is  substantial  covariance  stability  in  intelli¬ 
gence  aero,  the  adult  life  span.  Heruog  and  Schaie  ( 1986)  used 
a  longitudinal  factor  analysis  of  data  from  the  Seattle  Longitudi¬ 
nal  Study  (SLS,  Schaie.  1983)  to  show  (a)  that  a  general  intelli¬ 
gence  factor,  g  could  be  identified  for  three  age  groups  (young, 
middle-aged  and  old ).  (b)  that  this  g  factor  was  defined  equiva¬ 
lently  by  the  PMA  subtests  in  each  age  group  and  showed  in¬ 
variant  factor  loadings  across  longitudinal  occasions,  (c)  that 
the  covariance  stability  of  g  was  high  in  all  age  groups,  with 
longitudinal  correlations  of  g  with  itself  at  or  above  .9  between 
successive  longitudinal  occasions,  even  in  the  older  group,  and 
(d)  that  there  was  substantial  covariance  stability  in  the  five  pri¬ 
mary  ability  subtests.  independent  of  g.  as  reflected  in  the  pro¬ 
portion  of  variance  in  the  PMA  subtests  determined  by  “test- 
speofic"  factors 

Heruog  and  Schate's  (1986)  results  support  the  hypothesis 
that  age  changes  in  g  are  relatively  consistent  for  same -aged  in¬ 
dividuals  Although  there  are  individual  differences  in  change 
patterns,  these  differences  produce  shifts  in  relative  ordering  of 
individuals  that  are  small  relative  to  the  overall  papulation  vari¬ 
ance  in  g  It  is  interesting  that  covariance  stability  was  high  in 
ag.  ranges  in  which  Schaie  and  Heruog ( 1983) detected  decline 
in  the  individual  PMA  subtests — namely,  after  age  60.  This 
finding  suggests  only  modest  individual  differences  in  the  mag¬ 
nitudes  of  late-life  decline  in  g 

We  report  a  series  of  additional  analyses  designed  to  examine 
explicitly  the  mean  level  stability  of  g  and.  simultaneously,  to 
estimate  stability  of  individual  differences  in  g  The  results  of 
these  analyse'  demonstrate  the  independence  of  these  two  type 
of  stability  in  the  domain  of  psychometric  intelligence.  The 
analyses  also  were  used  to  examine  the  question  of  inflection 
point  for  shifts  from  stability  to  decline  in  general  intelligence. 

The  simultaneous  examination  of  mean  and  covariance  sta- 
bilitv  in  longitudinal  data  is  made  possible  by  use  of  structural 
equation  models  to  analyze  means  of  latent  variables  (e.g  .  Mc- 
Ardie  A  McDonald.  1 984,  Sorbom,  19821  The  longitudinal 
factor  analyses  reported  by  Heruog  and  Schaie  (1986)  consti¬ 
tute  an  important  precursor  to  simultaneous  analysis  of  mean 
and  covariance  structures  Heruog  and  Schaie  found  metric  in¬ 
variance  in  the  g  factor  loadings  between  groups  and  across  lon¬ 
gitudinal  occasions  of  measurement  Metric  invariance  is  de¬ 
fined  as  equivalence  in  the  unstandardized  regression  weights 
of  variables  on  factors  (see  Horn,  McArdle,  &  Mason,  1984). 
As  discussed  by  several  developmental  methodologists  (e  g  . 
BaltesA  Nesselroade.  1973,  Labouvie,  1980a,  1980b,  Schaie  A 
Heruog  1 98  5 1.  an  assumption  of  metric  invariance  is  essential 
for  allowing  unambiguous  interpretation  of  quantitative  differ¬ 
ences  in  mean  levels  of  facior  scores.  The  demonstration  of  mei- 
nc  invariance  in  g  ensures  that  g  is  measured  in  equivalent  units 
of  measurement,  so  that  differences  in  g  factor  means  are  un- 
contaminated  reflections  of  mean  level  differences  in  the  latent 
variable  (see  Labouvie,  1980a,  1980b;  Schaie  A  Heruog.  1985, 
for  further  discussion  of  this  issue) 

Given  evidence  of  metric  invariance,  the  simultaneous  analy¬ 
sis  of  means  and  covariance  structures  requires  introduction  of 
the  means  imo  the  structural  equations  of  the  longitudinal  fac¬ 
tor  model  already  used  by  Heruog  and  Schaie  ( 1986)  The  criti¬ 
cal  questions  of  interest  were  (a)  What  ts  the  magnitude  of  mean 


age  changes  in  g  at  the  different  age  levels  studied?  (b)  Do  age 
differences  and  age  changes  in  g  fully  account  for  the  mean 
changes  in  PMA  subtests.  or  must  different  developmental 
trends  of  PMA  means  be  modeled  to  account  fully  for  the  infor¬ 
mation  in  the  means?  and  (c)  Is  there  evidence  for  independence 
of  ttabtiity  of  g  means  from  the  covariance  stability  of  g’’ 

Method 

Subjects 

The  subjects  in  this  study  were  participants  in  the  Seattle  Longitudi¬ 
nal  Study  conducted  by  Schaie  and  his  associates  (Schaie.  1983)  The 
population  consisted  of  members  of  a  health  maintenance  organization 
(HMO)  in  the  greater  Seattle  area  The  population  was  defined  as  all  of 
the  members  of  the  HMOuof  1956,  the  initial  gear  of  the  longitudinal 
study,  in  order  to  minimize  Ur  probability  of  aelection  differences  over 
time  All  of  the  pa  run  pan  Is  were  unpaid  volunt  s  who  answered  ques¬ 
tionnaires  and  took  part  in  a  single  psychometr  c  test  session  The  par¬ 
ticipants,  adults  between  the  ages  of  20  and  74  yean  at  the  first  test. 
represented  a  range  of  socioeconomic  and  ethnic  groups  (although  the 
population  defined  by  Ur  HMO  membership  in  1956  was  predomi 
nantly  White  and  somewhat  more  affluent  than  the  general  Seattle  pop¬ 
ulation:  Further  details  on  the  population  and  sampling  procedures 
may  be  found  in  Schaie  (1983) 

Sequential  Sampling  Design 

The  longitudinal  samples  studied  here  art  a  subset  of  the  sequential 
samples  collected  in  the  SLS  The  sampling  plan  of  the  SLS  is  discussed 
more  fully  in  Schaie  (1983).  and  the  present  sample  is  defined  explicilly 
in  Hertzog  and  Schaie  (1986)  Briefly,  we  resuict  our  analysis  here  to 
two  14- year  longitudinal  samples  (first  tested  in  1956  or  in  1963)  Data 
from  the  two  longitudinal  sequences  were  partitioned  into  a  hybrid  se¬ 
quential  data  matrix  described  in  Table  I  The  paruuoned  dal*  matrix 
forms  three  sge  groups  for  simultaneous  analysis 

Variables 

As  part  of  a  larger  psychometric  battery,  all  of  ihe  subjects  were  ad¬ 
ministered  the  1948  version  of  the  SRA  Primary  Menial  Abilities  Test. 
Form  AM  1 1-17  (Thurstone  A  Thurstone.  1949)  The  1948  PMA  in¬ 
cludes  five  subtests,  all  of  which  are  timed  and  have  significant  speed 
components  in  aduli  samples  (see  Schaie  A  Hertzog-  19831  (a)  Verbal 
Meaning — a  test  of  recognition  vocabulary,  (b)  Space — a  lest  of  spatial 
relations  requiring  menial  rotation  of  figures  in  a  two-dimensional 
plane,  (c)  Reasoning — a  test  of  inductive  reasoning  requiring  recogni¬ 
tion  and  extrapolation  of  patterns  of  letter  sequences,  (dl  Number — a 
test  of  the  ability  to  solve  simple  two-column  addition  problems  quickly 
and  accurately,  and  (e)  Word  Fluency — a  test  of  the  ability  to  retrieve 
words  from  semantic  memory  according  to  an  arbitrary  syntactic  rule 
(words  beginning  with  the  letter  a)  Scoring  followed  the  PMA  manual 
Verbal  Meaning  and  Reasoning  were  scored  in  terms  of  the  number  of 
correct  items.  Space  and  Number  were  scored  by  subtracting  incorrect 
items  (comission  errors)  from  the  total  number  of  correct  items,  and 
Word  Fluency  was  scored  by  tallying  the  number  of  unique,  admissible 
words  generated  during  the  allotted  time 

Models  and  Statistical  Procedures 

The  longitudinal  facior  model  used  is  an  applicant,  via  generic  lon¬ 
gitudinal  model  described  m  some  detail  by  Joreskog  and  Sorborr 
(1977,  see  also  Heruog.  in  press.  Horn  A  McArdle.  1980,  Schaie  A 
Hertzog.  1985).  A  detailed  description  of  the  model  may  be  found  in 
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Table  I 


Reparameienzed  Segue  uia!  Sample  for  Multiple  Group  Analysis 


Cohort 

Agr 

Group/sample 

(Af  birth  year) 

Occasion  1 

Occasion  2 

Occasion  3 

n 

Group  l 

1 

1931 

25 

32 

39 

21 

i 

1924 

32 

39 

46 

26 

2 

1938 

25 

32 

39 

22 

2 

1931 

32 

39 

46 

40 

M 

30 

37 

44 

Total 

109 

Group  2 
! 

1917 

39 

46 

53 

27 

1 

1910 

46 

53 

60 

32 

2 

1924 

39 

46 

53 

51 

2 

1917 

46 

53 

60 

50 

M 

42 

49 

56 

lotal 

160 

Group  3 

1 

1903 

53 

60 

67 

28 

1 

1896 

60 

67 

74 

15 

1 

1889 

67 

74 

Sl 

13 

2 

1910 

53 

60 

67 

48 

2 

1903 

60 

67 

74 

18 

w 

1896 

67 

74 

81 

21 

.\f 

58 

65 

72 

T.v.a 

143 

Hertiog  and  Schau  i  I986i  7  he  model  specified  an  occasion-specific  ft 
factor  ai  each  longitudinal  occasion  The  facior  covariance  matrix  mod¬ 
eled  the  variances  and  covariances  of  f  at  the  differed  occasions  of  mea- 
suremen;.  and  the  residuals  in  the  PMA  subtest:  were  modeled  as  hav¬ 
ing  test-specific  covariances  (e  g  .  the  residuals  for  Verbal  Meaning  were 
allowed  to  cova-v  across  longitudinal  occasionsi  The  specification 
of  longitudinal  models  including  factor  means  is  relatively  complex 
(Joreshog  &  Sorbom.  1984,  McArdle  &  Epstein,  198'.  Sorbom.  1982 1 
The  critical  features  are  (ai  a  vector  oflocation  constants,  analogous  to 
grand  means,  (bi  representation  of  latent  variable  means  as  regressions 
on  a  fixed  constant  and  modeled  in  the  lisrel  gamma  parameter  ma¬ 
trix.  and  (ci  the  assumption  that  the  means  of  all  residuals  are  zero  in 
the  population  The  vector  of  location  constants  identifies  an  intercept 
for  each  observed  variable  'PMA  subtestl  In  longitudinal  analysts  of 
multiple  groups  these  location  parameters  are  constrained  equal  both 
across  longitudinal  occasions  and  between  'he  multiple  age  groups 
Given  data  containing  neither  group  differences  nor  longitudinal 
changes  in  means,  this  location  parameter  vector  would  perfectly  ac¬ 
count  for  the  mean  structure  Thus,  the  model  with  factor  means  will 
be  meaningful  only  if  there  are  either  group  differences  or  longitudinal 
changes  in  observed  variable  means  that  the  model  may  attempt  to 
ttruciure  as  a  function  of  the  factor  means 

Identification  of  the  location  parameters  and  the  factor  means  is 
achieved  by  fixing  the  mean  ofg  to  aero  for  one  age  group  at  one  longitu¬ 
dinal  occasion  In  the  models  reported,  we  fixed  the  g  mean  for  the 
middle-aged  group  at  the  first  occasion  (mean  age  42)  at  aero  Thix  pro¬ 
cedure  then  enables  the  remaining  factor  means  to  be  estimated  as  devi¬ 
ations  from  this  reference  point  (see  Jbreskog  A  Sorbom,  1984,  Sor¬ 
bom.  1982)  for  additional  details  The  fact  that  factor  means  are  mod¬ 
eled  as  regression  of  factors  lit.  gl  on  ■  constant  requires  the 
assumption  that  the  means  of  the  residuals  are  aero  This  is  an  unlikely 
assumption,  given  that  we  expect  age  trends  in  mean  levels  to  vary 
across  TMA  subiests  (independent  of  their  relation  tog)  It  is.  however, 


possible  to  estimate  residual  component  means  by  moving  these  param¬ 
eters  into  the  latent  variable  vector  in  LISRE'.  ' 

All  of  the  models  were  estimated  in  either  LISREL  v  or  VI  (Joreskog 
A  Sorbom,  1984'  using  maximum  likelihood  estimation  In  structural 
modeling,  model  fit  can  be  assessed  by  likelihood  ratio  chi-square,  as 
well  as  relative  fit  indicts  provided  by  the  program  These  indices  are  of 
less  value  in  models  with  means,  however,  so  we  report  a  decomposition 
of  overall  model  fit  into  (a)  fit  of  the  covariance  structure  mode1  and  fb) 
fit  of  the  mean  structure  model  (see  Bender  A  Boncn.  1980.  Sobel  A 
Bohmstedt,  1985).  The  relative  fit  index  for  the  means  may  be  inter¬ 
preted  as  an  index  of  the  proportion  of  information  in  the  mean  struc¬ 
ture,  adjusted  for  location  parameters,  accounted  for  by  the  model 

The  procedures  used  here  are  unabashedly  exploratory  in  nature 
The  goal  is  to  use  the  USREL  model  to  explore  descriptive  developmental 
hypotheses  about  the  longitudinal  mean  and  covariance  structures  of 
the  PMA  subtests.  This  use  of  a  generic  longitudinal  facior  model  is  an 
appropriate  application  of  structural  equation  techniques,  which 
are  ideal  for  exploratory  multivariate  modeling  of  longitudinal  data 
(Hertzog.  in  press:  McArdle  A  Epstein.  1987)  This  study  cannot  and 
should  not  be  considered  to  represent  a  confirmatory  analysis,  in  the 
philosophical  sense  of  the  term 

Results 

The  first  model  we  estimated  fixed  the  g  factor  means  at  zero 
in  all  three  age  groups,  but  allowed  all  location  parameters  to 
be  freely  estimated  This  model  fits  the  15  means  of  each  age 
group  with  15  freely  estimated  location  parameters.  There  is 


1  A  listing  of  the  lisrel  vi  spectficsuons  for  models  with  factor  and 
residual  means  is  available  from  the  first  author 
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Table  2 

Goodness-of-Fit  far  Longitudinal  Factor  Model  With  Means 


Model 

x1 

df 

F 

P 

M,  (saturated) 

287.68 

248 

.352 

.048 

Mb  (null  in  means) 

642.02 

288 

.785 

.000 

M,  (g  factor  means  i 

M;  ij?  faclor  means 

467.59 

280 

.572 

.000 

all  0  in  middle- 
aged) 

470.88 

282 

.575 

.000 

Mj  (gand  test- 

specific  faclor 

means) 

338  76 

270 

.414 

.003 

M,  (gand  residual 

means  for  V,  S. 

N.  W> 

299  05 

254 

.366 

.027 

Foie  V  *  Verbal  Meaning  S  =  Space,  R  *  Reasoning;  N  -  Number, 
W  -  Word  Fluenc> 

“  lisrel  filling  function  at  minimum 


a  one-to-one  correspondence  between  location  parameters  and 
sample  means,  and  as  such,  the  location  parameters  are  just- 
tdentified  T  hi'  model  is  therefore  saturated  with  respect  to  the 
means,  using  Bernier  and  Bonett  s  (1980)  definition  The  fit  of 
the  model,  denoted  Ms.  is  reported  in  Tables  2  and  3  As  ex¬ 
pected.  this  model  fit  the  same  as  the  model  ignoring  means 
reported  by  Hcrtrog  and  Schaie  (1986).  and  yielded  an  identical 
longitudinal  facie  solution  A  second  preliminary  model,  fol¬ 
lowing  recommendations  of  Benller  and  Bonett  (1980).  was  a 
null  model  in  the  means  This  model  specified  five  location  pa¬ 
rameters  one  for  each  PM  A  subtest,  and  constrained  these  pa¬ 
rameters  to  fit  the  means  of  all  three  longitudinal  occasions  for 
all  three  age  groups  Thus  the  45  population  means  were  fit 
wnn  five  location  parameters  This  null  model.  M.,  would  have 
a  fit  equal  to  the  saturated  model.  Ms,  if  there  were  no  group 
difference*  or  longitudinal  changes  in  PMA  subtest  means  to 
structure  a*  part  of  tne  analysis  There  was.  however,  a  substan¬ 
tial.  statistically  significant  difference  between  the  two  models, 
a*  seen  in  the  first  mode!  comparison  reported  in  Table  3. 
Clearly,  there  wa*  longitudinal  and  age  group  variation  in  the 
PMA  means,  and  the  task  of  the  analysis  was  to  structure  this 
variation  in  terms  of  the  longitudinal  factor  model. 


The  first  substantive  model  of  interest  specified  g  factor 
means  in  all  three  age  groups.  Interpretation  of  the  fit  of  these 
substantive  models  must  be  made  on  the  basis  of  relative  differ¬ 
ences  from  the  null  and  saturated  models,  so  that  one  can  evalu¬ 
ate  fit  to  the  means  ignoring  (assuming)  the  basis  specification 
and  fit  of  the  longitudinal  factor  model  (Bender  &  Bonett,  1980. 
Sobel  &  Bohmstedt,  1985).  In  essence,  the  difference  between 
the  null  and  saturated  models  defines  a  range  of  possible  fits 
of  models  structuring  means  in  the  longitudinal  analysis.  The 
critical  question  is  how  dose  a  model  with  structured  means 
comes  to  the  fit  of  the  model  that  is  saturated  in  the  means  (or 
conversely,  how  far  it  has  come  from  the  poor  fit  of  the  null 
model). 

As  shown  ir.  Table  3,  this  first  substantive  model,  M,,  im¬ 
proved  meaningfully  on  the  fit  of  the  null  model,  although  there 
was  still  a  significant  difference  between  M,  and  Ms.  The  rela¬ 
tive  6'.  of  the  new  model  is  best  indexed  by  the  Sobel  and  Bohm- 
sted:  (1985)  relative  fit  index,  denoted  as  6  in  Table  3.  The  fit  of 
.49  indicates  that  about  half  of  th-  variation  in  the  means  had 
successfully  been  structured  by  M  i . 

On-  interesting  outcome  of  mode!  M,  was  that  the  g  factor 
means  fc  the  middle-aged  adults  were  r.pt  sigr.incantly  differ¬ 
ent  from  zero,  relative  to  their  standard  errors.  In  models  of  this 
type,  these  estimated  factor  means  are  scaled  as  deviations  from 
the  fixed  zero  mean  (age  42  for  the  middle-aged  population) 
Therefore,  the  finding  of  essentially  zero  g  means  at  ages  49  and 
56  for  the  middle-aged  group  indicated  no  statistically  signift- 
*  cant  change  in  mean  level  of  g  over  this  age  range  A  second 
model,  M;,  incorporated  this  feature  by  fixing  the  g  means  to 
zero  for  all  three  ages  of  the  middle-aged  group  This  model  did 
not  fit  more  poorly  than  M , . 

The  fact  that  Mj  fit  significantly  worse  than  Ms  implied  Lh  ■ 
the  assumption  of  no  mean  variation  in  the  residuals  fo>  ,ne 
PMA  factors  had  to  be  abandoned  That  is.  it  was  not  pos-ible 
to  model  age-group  differences  and  age  changes  in  PMA  means 
solely  as  a  lunction  of  age  differences  and  age  changes  in  g  faclor 
means  Apparently,  the  primary  abilities  measured  by  the  PM  A 
have  variations  in  the  means  that  are  saliently  different  from  the 
behavior  of  the  g  factor  means 

A  logical  possibility  is  that  there  are  age  group  differences  in 
sublcst-specific  means,  but  no  age  group  differences  in  patterns 


Table  3 


Comparisons  P'7,;  Bctneer.  Alternative  Models  With  Factor  Means 


M. 

M. 

Comparison 

Model 

bd' 

Ax* 

AxC 

F 

Comparison 

Ax’  tsdf 

AF1 

M, 


M, 

— 

— 

— 

— 

— 

M.-M, 

354  34 

40 

M 

174  41 

8 

179  91 

32 

.492 

— 

_ 

_ 

_ 

M, 

17  |  O4 

6 

182  40 

34 

.485 

2  49 

4 

.007 

M. 

v3 

18 

51.08 

22 

.857 

Mt-Mj 

128  83 

10 

.365 

M, 

34: 

34 

1!  37 

6 

.968 

M,-M, 

168  54 

28 

.48? 

1  Difference  in  1 '  heiween  model  and  M„  (null  modeli 

*  Difference  in  x‘  heiween  model  and  M,  (saturated  model) 

'  Relative  6t  index  for  6t  to  the  mean  structure 

*  Change  in  relauve  fit  index  in  means  for  models  under  comparison 
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of  age  changes  in  the  primary  ability  means.  Such  a  pattern 
could  arise  if  age  changes  in  the  primary  abilities  were  solely  a 
function  of  age  changes  in  g.  but  there  were  also  differential 
patterns  of  cohort  effects  across  the  primary  ability  means.  Our 
previous  work  (Hertzog  &  Schaie.  1986),  modeling  both  g  and 
PMA  test-specific  factors,  provided  a  convenient  means  of  test¬ 
ing  this  hypothesis.  We  used  a  model  that  specified  eight  factors 
in  each  age  group:  (a)  three  g  factors,  one  at  each  longitudinal 
occasion,  and  (b)  five  test-specific  factors,  one  for  each  PMA 
subtest  We  estimated  factor  means  for  all  eight  factors,  achiev¬ 
ing  identification  of  the  test-specific  factor  means  by  fixing  all 
five  test-sp'ciftc  factor  means  for  the  middle-aged  group  to  zero. 
This  model.  Mj,  allowed  the  g  factor  means  at  ages  49  and  56 
to  be  freely  estimated  in  the  middle-aged  group,  as  in  model 
M , .  We  did  not  wish  to  assume  mean  stability  in  g,  even  though 
that  was  suggested  from  the  M:-M,  comparison.  It  could  have 
been  the  case  that  the  stable  g  factor  means  in  the  middle-aged 
group  in  the  previous  models  were  an  artifact  of  model  misspec- 
lfication 

Model  Mi  also  constrained  the  test-specific  factor  loadings  to 
be  equal  over  the  three  age  groups  (see  Hertzog  &  Schaie,  1986). 
The  equality  constraints  on  test-specific  factor  loadings  did  not 
permit  any  of  the  age-group  differences  in  mean  changes  to  be 
modeled  by  the  test-specific  factor  means  Group  differences  in 
mean  change  or.  the  PMA  variables  could  only  be  reflected  in 
the  g  factor  means 

Table  2  reports  the  fit  of  M  >  T he  model  fit  significantly  better 
than  Mi .  indicating  there  were  statistically  significant  age  group 
differences  in  lest-specific  factor  means  However,  the  model 
still  did  not  approximate  the  fit  of  Ms,  requiring  rejection  of 
Model  M,  It  was  also  still  the  case  that  the  g  fnctor  means  did 
not  differ  significantly  between  ages  42  and  56  for  the  middle- 
aged  group  We  concliiH»d  that  there  were  age-group  differences 
in  PM  A  subtest  means,  but  that  there  are  also  differential  age 
changes  for  the  PMA  subtest  means  independent  of  j?  We  also 
concluded  that  it  was  still  plausible  to  maintain  the  assumption 
of  no  age  changes  in  g  in  the  middle-aged  group. 

We  next  proceeded  by  fitting  a  series  of  models  allowing  re¬ 
sidual  means  This  approach  was  needed  to  allow  for  age-group 
differences  in  patterns  of  mean  age  changes  on  the  primary'  abil¬ 
ities  This  series  of  models  proceeded  in  exploratory  fashion 
Large  mean  residuals  (differences  between  sample  means  for 
the  PMA  subtests  and  PM  A  means  predicted  from  the  model 
parameters)  and  saliem  lisrfl  modification  indices  were  used 
to  indicate  a  need  for  structuring  additional  mean  parameters. 
Unlike  Mi.  these  models  specified  a  separate  PMA  residual 
"factor"  at  each  longitudinal  occasion,  permitting  both  g  and 
the  PMA  residuals  from  g  to  display  age-related  change.  After 
a  scries  of  model  modifications,  we  arrived  at  a  model  that  did 
not  differ  significantly  from  the  saturated  model.  This  model 
allowed  residual  means  for  Word  Fluency.  Number.  Verbal 
Meaning,  and  Space.  This  modified  model.  M«  in  Table  2, 
achieved  a  relative  fit  index  of  97  to  the  means,  indicating  ex¬ 
cellent  fit  Of  course,  this  fit  w-as  achieved  by  adjusting  to  the 
sample  means,  and  can  therefore  be  treated  only  as  a  descriptive 
index  of  the  success  of  the  model  modification  process 

One  of  the  major  reasons  for  fitting  additional  models  to  the 
means  was  to  ensure  that  the  estimated  age  changes  and  age 
differences  in  g  means  were  not  inappropriately  biased  by  the 


incorrect  assumption  of  no  residual  means.  Hertzog  and  Carter 
(1982)  previously  demonstrated  that  group  differences  in  intel¬ 
ligence  factor  means  were  affected  by  the  specification  error  of 
zero  residual  means.  Table  4  reports  the  g  factor  means  for  the 
four  substantive  models,  M,  through  M».  Irrespective  of  the 
model,  the  relative  pattern  of  g  factor  means  in  the  three  age 
groups  remained  the  same.  The  g  factor  means  increased  from 
mean  age  30  to  mean  age  37  in  the  young  group,  and  then  re¬ 
mained  relatively  stable  through  age  44.  The  g  factor  exhibited 
mean  stability  from  mean  age  42  through  mean  age  56  in  the 
middle-aged  group.  Finally,  g  showed  substantial  decline  from 
meat,  age  58  through  mean  age  72  in  the  old  group.  The  mean 
decline  in  g  in  the  old  group  was  roughly  linear  over  the  1 4- year 
period.  The  comparable  pattern  of g  mean  behavior  is  particu¬ 
larly  important  in  Model  M, ,  in  which  it  was  most  likely  that 
the  apparent  age  changes  in  g  estimated  in  Models  M,  through 
M>  would  change  as  a  function  of  specifying  longitudinal 
changes  in  the  PMA  residuals  as  well.  The  fact  that  conclusions 
regarding  the  behavior  of g  means  were  not  altered  by  specifying 
longitudinal  variation  in  PMA  residual  means  indicated  that 
the  mean  patterns  were  unlikely  to  be  an  artifact  of  model  speci¬ 
fication. 

Approximate  99%  confidence  intervals  around  the  factor 
means  can  be  calculated  by  subtracting  and  adding  2.5  SEs  to 
the  estimated  g  factor  means  Inspection  of  Table  4  clearly 
showed  that  these  99%  confidence  intervals  did  not  include  zero 
for  any  of  the  freely  estimated  means  in  the  old  and  young 
groups.  As  these  means  are  deviation  contrasts  from  the  mid¬ 
dle-aged  g  means,  we  concluded  there  were  reliable  age  group 
differences  in  means.  The  significant  differences  included  com¬ 
parisons  between  the  different  groups  at  roughly  comparable 
ages  That  is.  the  young  group  at  age  44  (Occasion  3)  differed 
significantly  from  the  middle-aged  group  at  age  42  (Occasion 
1),  as  did  the  middle-aged  group  at  mean  age  56  (Occasion  3) 
from  the  old  group  at  mean  age  58  (Occasion  I )  Although  the 
b  >rid  sequential  design  does  not  completely  unconfnund  age 
aiges  and  cohort  differences,  it  seems  likely  that  these  differ¬ 
ences  reflect  cohort  differences  in  the  mean  levels  of  g 

Table  5  reports  the  residual  means  estimated  in  the  final 
model,  M,  These  means  must  be  interpreted  with  care.  They 
represent  mean  patterns  in  the  PMA  subtests  orthogonal  to  the 
trends  mediated  through  g  The  first  feature  of  note  involves 
the  residual  means  for  Word  Fluency  and  Number  in  the  mid¬ 
dle-aged  group.  Although  the  g  means  showed  no  age-related 
changes  in  the  middle-aged,  the  residuals  for  Word  Fluency  and 
Number  did  change  There  were  small  but  statistically  signifi¬ 
cant  declines  in  Word  Fluency  and  Number  between  mean  ages 
42  and  56.  There  is  a  second  noteworthy  feature  of  the  residual 
means  in  Table  4,  It  seems  that  the  large  age-group  (cohort) 
differences  in  g  overesumaled age  group  differences  in  Number 
and  Verbal  Meaning.  This  was  shown  by  the  large  negative 
means  in  the  young  group  for  these  two  PMA  subtests,  as  well 
as  the  large  positive  means  for  Number  for  the  old  group.  Fi¬ 
nally,  there  appeared  to  be  modest  levels  of  decline  m  Space  for 
the  old  group  (between  mean  ages  58  and  65)  that  was  greater 
than  the  decline  in  Space  predicted  by  g 

We  do  not  report  here  the  other  parameter  estimates  from  the 
longitudinal  solution  (e  g  .  factor  covariances,  factor  loadings 
because  they  differed  trivially  from  the  solution  without  means 
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Table  4 

The  g  Factor  Means  for  Alternative  Longitudinal  Models 

Model 


M,  Mj  M,  M, 


Group 

M  age 

M 

SE 

M 

SE 

M 

SE 

M 

SE 

Young 

*i 

36 

1  61 

060 

1.62 

0  59 

8  54 

3.26 

2  82 

0  65 

*i 

y 

2  76 

0.57 

2.78 

0  57 

ton 

349 

3  99 

065 

44 

2.70 

0.56 

2.71 

0.55 

9.87 

3.39 

3.50 

0.62 

Middle-aged 

g. 

4: 

0* 

— 

0* 

— 

0* 

— 

0* 

— 

g: 

40 

0.10 

0.17 

0* 

— 

0.14 

0  16 

0* 

— 

56 

-0.20 

0.18 

0* 

— 

-0.20 

0.17 

0* 

— 

Old 

gl 

5k 

-3.96 

0.61 

-3.97 

0.60 

-10.96 

4.48 

-4.20 

064 

gl 

65 

-461 

0.61 

-4.62 

0.61 

- 12  4  1 

464 

-4.78 

0.64 

gy 

-6  55 

0.65 

-6.57 

0  64 

-13.28 

4.24 

-6.22 

066 

Sole  Asterisks  jenote  fixed  factor  means  The  g  factor  subscripts  denote  longitudinal  occasion 


repotted  b>  Henzog  and  Schaie  (1986).  However.  one  question 
remained  regarding  the  factor  covariance  matrix  for  g  As  re¬ 
ported  in  Hertzog  and  Schaie.  there  was  an  age-related  increase 
in  g  factor  variance  n  the  old  group  The  old  group  also  had 
greater  ove'all  varifre  in  ?  than  did  the  middle-aged  and 
young  groups  One  possible  explanation  of  these  differences  is 
that  the>  are  methodological  artifacts  The  old  group  was 
formed  by  pooling  over  a  larger  age  span  in  order  to  achieve 
acceptable  sample  sire  for  structural  analysis  (refer  back  to  Ta- 


Table  5 

Residua!  Means  m  Final  Mode!  (At, 


Age  group 

Variable 

Middle-aged 

Old 

.Kf 

SI 

A/  SE 

M 

SE 

Oc  easier 

Verbal  Meaning 

1 

_  <  ■ 

1  0, 

o • 

0.26 

0  98 

2 

-  4  " 

1.0- 

0* 

1.09 

1.05 

3 

-3  65 

1.03 

0* 

-0.49 

1.08 

Space 

1 

0  5‘ 

1  15 

0* 

—  1.19 

1.01 

■5 

0  Of 

1.22 

0- 

-2.68 

1.01 

3 

1 7e 

1.20 

0* 

-2.56 

1.03 

Reasoning 

1 

0* 

0* 

0* 

2 

0* 

0* 

0* 

3 

0* 

0* 

0* 

Number 

1 

-5.56 

1.32 

0* 

3.71 

1.23 

2 

-5 

1  40 

0  28  0  44 

5  12 

1.28 

3 

-6  03 

131 

-1.62  0.43 

3.38 

1.27 

Word  Fluency 

1 

-  1  4< 

1  48 

O’ 

4.98 

1.45 

*3 

-.3  V 

1  50 

-1.43  0  68 

2.77 

1  46 

3 

-i  it 

1  60 

-2  08  0  69 

2  36 

1  49 

Sole  Asterisks  denote  fixed  0  parameters 


ble  I )  In  the  present  context,  it  was  possible  that  the  develop¬ 
mental  changes  in  g  factor  means  would  differ  if  the  youngest 
age  group  (mean  age  53  at  Occasion  I  .age  range.  50  to  56)  were 
omitted  from  the  analysis.  To  address  this  question,  we  rede¬ 
fined  the  old  group  to  include  only  the  individuals  age  51  and 
older  at  first  test,  and  re-ran  the  longitudinal  model  with  this 
subsamplc  Briefly,  this  analysis  showed  (a)  similar  age  declines 
in  g  means,  but  of  greater  magnitude,  (b)  higher  variability  in  g 
in  the  old  group,  but  (c)  more  homogeneity  of  g  variance  across 
the  three  longitudinal  occasions  Thus,  it  appears  that  the  in¬ 
creasing  variability  in  g  over  time,  found  in  the  full  sample,  re¬ 
flected  differences  in  developmental  patterns  from  ages  50  to 
65,  as  opposed  to  heterogeneity  of  developmental  trajectories  of 
same-aged  individuals  in  the  latter  part  of  the  adult  life  span 
The  analysis  thus  provides  further  support  for  the  argument  of 
an  inflection  point  around  age  60,  at  which  age  decrements  in 
PMA  performance  begin  to  accelerate  The  increased  variabil¬ 
ity  in  g  in  the  older  group  is  not,  howcvei,  merely  a  methodologi¬ 
cal  artifact  of  age-group  definition 

Discussion 

The  results  from  this  analysis  amplify  and  accentuate  several 
issues  regarding  age  changes  in  psychometric  intelligence.  First, 
the  results  extend  Schaie's  (1983)  work  on  age  patterns  in  multi¬ 
ple  primary  intellectual  abilities  to  the  level  of  general  intelli¬ 
gence.  as  measured  by  the  g  factor  defined  from  the  PMA  sub¬ 
tests.  We  fou  nd  a  pattern  of  age  changes  i  n  g  factor  means  high  ly 
consistent  with  previous  univariate  results  (e.g..  Schaie  4  Hert- 
rog,  1983).  There  were  small  increases  in  g  in  early  adulthood 
(through  mean  age  32).  stability  in  g  means  through  middle  age 
(until  mean  age  56).  and  substantial  decline  in  late  life.  We  ex¬ 
plicitly  tested  the  hypothesis  that  there  was  no  decline  in  g  in 
the  middJe-aged  group  at  two  different  junctures,  and  could  net 
reject  the  hypothesis  Moreover,  the  age  changes  that  were  esti¬ 
mated  as  part  of  this  hypothesis  test  were  so  small  as  to  be  trivial 
in  importance.  On  the  other  hand,  we  did  fine  evidence  of  some 
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decline  in  the  middle-aged  group  on  the  PMA  subtests  Word 
Fluency  and  Number,  independent  of  g 
The  results  also  suggest  substantial  cohort  differences  in  g 
means  The  age  groups  differed  not  only  in  terms  of  mean  age 
at  initial  test  but  also  in  birth  cohort  membership.  The  fact  that 
the  middle-aged  group  at  mean  age  56  performed  significantly 
better  on  g  than  did  the  old  group  at  mean  age  58  surely  indi¬ 
cates  salient  cohort  differences  in  these  data,  as  already  detailed 
by  Schaie  ( 1 983) 

The  unique  contribution  of  this  study,  in  terms  of  estimating 
age  changes  in  PMA  means,  stems  from  the  fact  that  the  mean 
differences  are  estimated  at  the  level  of  the  g  factor.  Because 
these  estimates  are  based  on  the  simultaneously  estimated  fac¬ 
tor  pattern  weights,  they  represent  optimal  estimates  of g  factor 
means  that  are  not  contaminated  by  mean  patterns  specific  to 
the  primary  abilities  themselves  Moreover,  the  analysis  permit¬ 
ted  the  evaluation  of  mean  trends  in  the  primary  abilities  after 
they  have  been  residualized  with  respect  to  g 
An  additional  contribution  of  the  present  analysis  is  that  it 
permits  independent  evaluation  of  mean  stability  and  covari¬ 
ance  stability  in  g  These  results  demonstrate  concretely  the  in¬ 
dependence  of  these  two  types  of  stability  In  all  three  age 
groups,  individual  differences  in  g  were  highly  stable  over  the 
14-year  period  Yet  each  age  group  showed  dramatically  differ¬ 
ent  age  trends  in  g  In  the  young  group,  g  increased  to  a  stable 
plateau  In  the  middle-aged  group,  g  means  remained  stable, 
but  in  the  old  group,  substantial  g  decline  was  observed 
The  change  in  mean  patterns  across  the  age  groups,  coupled 
with  the  high  degree  of  covariance  stability  across  the  life  span, 
has  important  implications  for  several  prominent  hypotheses 
about  adult  intellectual  development.  It  is  often  the  case,  espe¬ 
cially  recently,  that  g  is  identified  with  basic  intelligence  (e  g  . 
Jensen.  1982)  Given  (ai  the  widely  accepted  notion  that  there 
is  multidirectional!!)  in  age  trends  in  ability,  such  that  some, 
hut  not  all.  abilities  show  age-relaied  declines  (e.g..  Baltes  et  al.. 
1984.  BotwinicV,.  1 92  ?;  Horn  &  Donaldson.  1980)  and  (b)  the 
accepted  argument  that  it  is  measures  of  fluid  intelligence 
(Horn.  1985.  Hom  4  Donaldson,  1980),  or  alternatively, 
Wechsler-type  performance  tests  (Botwinick.  1977;  Sallhouse, 
1982)  that  manifest  early  decline,  one  would  expect  that  g.  as 
measured  here  would  be  the  prime  candidate  for  evidencing 
decline  from  ages  25  to  55  To  the  contrary,  it  appears  to  be  the 
case  that  g  manifests  both  mean  stability  and  covariance  stabil¬ 
ity  in  middle  age  in  the  Seattle  Longitudinal  Sample 

How  can  this  discrepancy  be  explained?  One  possible  expla¬ 
nation  is  that  the  g  factor  estimated  by  the  PMA  variables  is 
highly  specific  to  the  variables  or  to  the  samples,  and  hence  is 
in  some  way  a  poor  measure  of  the  construct  of  general  intelli¬ 
gence  This  possibility  seems  relatively  implausible.  The  g  fac¬ 
tor  loadings  estimated  here  are  highly  consistent  with  those 
found  by  Thurstone  and  Thurstone  (1941)  for  these  tests,  and 
show  a  pattern  of  loadings  consistent  with  a  plethora  of  studies 
from  the  psychometric  literature  The  best  indicator  of  g  in  the 
PMA.  judged  from  our  factor  loadings,  is  Reasoning  This  sub¬ 
test.  a  measure  of  induction,  is  probably  the  best  indicator  of 
general  intelligence  and  of  the  Hom-Cattell  second-order  fluid 
intelligence  factor  in  the  PMA  (Hom  4.  Donaldson,  1976)  Not 
only  did  the  Reasoning  test  load  highly  on  g.  but  the  Reasoning 
means  in  all  age  groups  were  well  fit  by  the  models  specifying 


no  age-related  changes  in  gin  the  middle-aged  group  Although 
we  have  estimated  the  single  higher  order  g  factor  here,  as  op¬ 
posed  to  fluid  intelligence,  Gustaffson  ( 1 984 )  recently  reported 
hierarchical  factor  results  from  multiple  intelligence  tests  that 
suggest  that  the  g  factor  is  isomorphic  with  Quid  intelligence 

Thus,  it  would  seem  that  the  hypothesis  of  early  decline  in  g 
is  not  supported  by  these  data.  The  best  model  for  the  develop¬ 
ment  of  g  in  middle-age  is  a  model  of  stability  in  both  means 
and  individual  differences.  One  could  argue  that  the  generaliz- 
ability  of  these  results  is  limited  because  individuals  who  mani¬ 
fest  early  decline  are  more  likely  to  drop  out  of  longitudinal 
studies.  Perhaps  so,  but  tbe  finding  of  mean  stability  of  g.  even 
in  a  select  subpopulation,  argues  against  the  ubiquity  of  early 
age  declines  in  g  There  is  evidence  in  these  data  of  decline  in 
two  PMA  subtests,  Word  Fluency  and  Number,  in  tbe  middle- 
aged  group.  We  suggest  that,  barring  the  sort  of  nan  normative 
events  that  lead  to  early  mortality,  individuals  appear  to  main¬ 
tain  stable  performance  levels  of  g  until  sometime  after  age  50 

However,  the  developmental  pattern  of  g  begins  to  change 
dramatically  between  ages  50  and  60.  After  mean  age  58,  we 
found  substantial,  statistically  significant  decrements  in  mean 
levels  of  g  This  decline  was  observed  in  an  age  group  in  which 
the  covariance  stability  of  g  remained  quite  high  These  results, 
then,  offer  little  support  to  the  hope  that  age-related  decline  in 
g  is  somehow  nonnormative  or  is  restricted  to  a  small  subpopu¬ 
lation  of  older  individuals.  We  did  find  increased  variance  in 
g  in  the  middle-aged  and  older  groups,  suggesting  some  small 
differences  in  developmental  trajectories  between  those  individ¬ 
uals  in  their  50s  and  those  in  their  60s.  However,  the  longitudi¬ 
nal  increases  in  g  variance  in  the  older  group — crucial  to  the 
argument  of  different  developmental  trajectories  in  old  age — 
were  eliminated  when  the  old  group  was  restricted  to  individu¬ 
als  age  57  and  older  at  first  test 

The  fact  that  it  was  necessary  to  fit  residual  mean  factors, 
varying  in  age  patterns,  provides  support  for  the  arguments  of 
Baltes  and  colleagues  (e.g.,  Baltes  etal  1 984  (that  intelligence  is 
both  multidimensional  and  multidirectional  in  ns  development 
For  example,  the  fact  that  young  adults  have  lower  means  on 
the  Verbal  Meaning  residuals  suggests  that  the  g  factor  means 
overestimate  the  age  differences  in  vocabulary,  even  though  Ver¬ 
bal  Meaning  has  high  loadings  on  g  This  partem  is  also  ob¬ 
served  for  the  Number  and  Word  Fluency  residual  means,  and 
may  suggest  reversed  cohort  differences  on  these  tests  when  g 
is  statistically  removed  from  these  tests  The  pattern  of  Space 
residual  means  in  the  old  group  indicates  greateT  decline  be¬ 
tween  ages  58  and  65  on  spatial  ability  than  is  predicted  by  g 
Some  caution  is  in  order  in  interpreting  these  residual  means 
Our  data  only  permit  estimation  of  factor  means  for  g  These 
residual  means  do  not  have  the  same  status  as  means  estimated 
in  models  with  multiple  measures  of  each  pnma’y  ability,  being 
much  more  likely  to  be  specific  to  the  PMA  subtest  than  would 
primary  ability  factor  means. 

The  analysis  provides  relatively  little  evidence  of  substantial 
individual  differences  in  intraindrvidual  change  in  general  intel¬ 
ligence  To  the  contrary,  these  findings  of  differential  age  group 
patterns  in  g  means,  coupled  with  high  degree  of  covariance 
stability  in  all  age  groups,  suggest  a  relatively  normalise  devel¬ 
opmental  transition  in  g  That  is,  it  appears  that  most  individu¬ 
als  make  a  transition  from  a  stability  to  a  decline  pattern  of  g 
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development  at  some  point  between  age  55  and  age  70,  with 
individual  differences  in  the  age  of  onset  of  this  transition 

It  is  important  to  note  that  these  inferences  are  based  on  pop¬ 
ulation  parameters,  and  that  there  are  some  individuals  who  do 
not  show  salient  decline  even  into  old  age  (Schaie,  1983).  There 
may  be  greater  heterogeneity  of  change  for  the  primary  abilities, 
as  opposed  to  g  ( see  Hertzog  A  Schaie.  1 986).  Nevertheless,  the 
results  suggest  that  the  heterogeneity  of  developmental  trends  in 
g  during  old  age  is  small  when  measured  against  the  population 
variance 

The  high  degree  of  covariance  stability  is  a  descriptive  phe¬ 
nomenon  and  should  not  be  assumed  to  demonstrate  the  valid¬ 
ity  of  biological  causes  of  age  changes  in  g  Stability  does  not 
imply  immutability,  and  Schaie  and  Willis  ( 1 986)  have  demon¬ 
strated  significant  training  gains  m  inductive  reasoning  in  indi¬ 
viduals  with  prior  histories  of  decline  in  this  ability  (all  of  whom 
were,  in  fact,  part  of  the  samples  used  in  the  present  analysis). 

In  a  sense,  these  results  contradict  aspects  of  the  arguments 
made  b\  both  sides  of  the  debate  regarding  the  nature  of  intel¬ 
lectual  decline  manifested  in  the  Seattle  Longitudinal  Study 
(Baltes  A  Schaie.  1976;  Horn  A  Donaldson.  1976)  The  results 
appear,  however  consistent  with  the  updated  perspectives  of 
both  Horn  11985)  and  Baltes  and  his  colleagues  (e  g.,  Baltes  et 
a!  .  1984  Dixon  ei  al  .  1985)  The  key  involves  an  assessment 
of  the  kinds  of  abilities  measured  in  timed  psychometric  tests 
such  as  the  Thurstone  PMa.  and  hence,  the  nature  of  the  g 
factor  extracted  from  it  Evidence  from  a  number  of  studies 
base  shown  that  Thurstone-type  tests  of  primary  abilities  have 
high  correlations  with  speed  of  basic  perceptual  processes  in 
adult  samples  (Cornelius.  Willis.  Nesselroade.  A  Baltes.  1983. 
Hertzog  198".  Horn.  Donaldson.  A  Engstrom.  1981).  Schaie 
originally  selected  the  adolescent  form  of  the  PMA  for  his  study, 
and  this  form  has  limited  item  difficulty  and  substantial  speed 
components  in  adult  samples  (e  g  .  Schaie.  Rosenthal.  A  Perl¬ 
man.  1953)  The  g  factor  estimated  in  this  study  was  marked  as 
highly  by  PM  A  Verbal  Meaning  as  by  PM  A  Reasoning  We  have 
recently,  shown  a  slrong  relationship  of  PMA  Verbal  Meaning 
to  a  Perceptual  Speed  factor  independent  of  ns  relationship  to 
other  vocabulary  tests  (e  g  .  ETS  Advanced  Vocabulary;  Schaie. 
Willis.  Hertzog.  A  Schulenberg.  1987),  Thus,  it  appears  that  the 
PMA  was  constructed  so  as  to  maximize  variance  determined 
by  what  might  be  termed  the  mechanics  of  intelligence  (e  g.. 
Hunt.  1978).  that  is.  the  speed  of  basic  cognitive  processes 
needed  for  rapid  decisions  of  low  to  moderate  difficulty.  Given 
thal  age-related  slowing  in  information-processing  speed  is  a 
highly  normative  developmental  phenomenon  (e  g  .  Birren, 
1974,  Salthouse.  1985).  we  can  construct  the  following  argu¬ 
ment  The  PMA  manifests  little  age  change  in  g  pnor  to  age  55 
because  g  as  operationally  defined  by  the  PMA.  emphasizes 
speeded  solution  of  problems  of  limited  difficulty.  However, 
sometime  after  age  50.  the  age-related  slowing  in  information- 
processing  speed  becomes  a  salient  limiting  factor  in  PMA  per¬ 
formance.  and  g  begins  to  decline  dramatically.  Individual 
differences  in  decline  are  minimized  because  (al  the  PMA  items 
are  not  optimally  sensitive  to  the  type  of  cognitive  processes 
likely  to  max..nize  psychometric  test  performance  in  superior 
old  adults  (eg.  strategies  for  solving  difficult  problems,  cogni¬ 
tive  styles,  and  metacogmtive  processes;  Baron.  1985,  Dixon,  in 
press;  Sternberg.  1985)  and  (b)  the  ability  domain  covered  by 


the  tests  is  highly  limited,  excluding  the  types  of  abilities  most 
likely  to  show  increment  and  differential  growth  in  adulthood, 
such  as  social  cognition,  domain-specific  procedural  know|. 
edge,  expertise,  and  postforma]  reasoning  (Berg  A  Sternberg. 
1985;  Dixon  et  al.,  1985;  Labouvie-Vief,  1985;  Rybash,  Hoyet, 
A  Roodin,  1986).  Although  important  gains  can  be  made  by 
studying  these  other  domains  of  cognition,  we  maintain  that  the 
study  of  cognitive  mechanics,  as  they  relate  to  performance  oo 
intelligence  tests,  remains  a  continuing  priority  for  gerontology 
A  formal  test  of  the  cognitive  mechanics  interpretation  of  psy¬ 
chometric  test  performance  in  adulthood  requires  investigation 
of  the  nature  of  the  information-processing  skills  tapped  by 
Thurstone-type  tests,  research  now  ongoing  in  several  labora¬ 
tories. 
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Text  Recall  in  Adulthood:  The  Role  of  Intellectual  Abilities 
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This  study  examined  age-related  predictive  relationships  between  an  array  of 
psychometric  intellectual  ability  markers  and  text  recall  performance  in  adulthood 
One  hundred  and  fifty  women  from  three  age  groups  (21-39  years.  40-58  years. 

60-"'8  years)  read  and  recalled  four  narrative  stories  at  three  delay  intervals  and 
completed  a  battery  of  12  factor-analvtically  defined  intellectual  ability  tests.  The 
results  indicated  (a)  that  text  memory  performance  in  adulthood  is  predicted  by 
multiple  abilities;  (b)  that  age  differences  in  text  memory  performance  overlap 
high'y  with  age  differences  in  multiple  abilities,  although  the  latter  do  not  fully 
account  for  the  former,  (cl  that  modest  Age  X  Ability  interactions  exist  but  are 
not  consistent  with  previous  reports,  suggesting  that  age  differences  decrease  with 
increasing  ability  levels;  and  (d)  that  the  pattern  of  intelligence-text  recall 
relationships  differs  by  age  group 


Research  examining  the  development  of 
adult  memory  has  shown  that  the  existence 
of  age-related  differences  in  secondary  mem¬ 
ory  performance  is  widespread  (Craik.  1977; 
Poon.  Fozard.  Cermak.  Arenberg.  &  Thomp¬ 
son.  1980).  With  few  exceptions,  younger 
adults  routinely  outperform  older  adults  when 
the  focus  of  the  task  is  on  verbatim  recall  of 
lists  of  numbers,  symbols,  words,  and  so 
forth.  However,  when  the  focus  of  the  task  is 
on  the  gist  recall  of  meaningful,  presumably 
ecologically  valid  text  materials,  the  nature 
and  extent  of  age-related  performance  differ¬ 
ences  are  considerably  less  clear.  A  number 
of  recent  studies  have  reported  age-related 
deficits  in  text  processing  that  conform  to  the 
general  pattern  observed  in  verbatim  recall 
of  word  lists  (Cohen,  1979;  Dixon.  Simon, 
Nowak.  &.  Hultsch,  1982;  Taub,  1975,  1976; 
Taub  &  Kline,  1978;  Zelinski,  Gilew'ski,  & 
Thompson,  1980).  Other  recent  studies  have 
found  that  younger  and  older  adults  appear 
to  be  equally  adept  at  comprehending  and 
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remembering  texts  (Harker.  Hanley.  &  Walsh. 
1982;  Meyer  &  Rice,  1981). 

More  specifically,  the  presence  or  absence 
of  adult  age  differences  in  text  processing 
appears  to  depend  on  multiple  contextual 
factors  (see  reviews  by  Hultsch  &  Dixon. 
1984;  Meyer  &  Rice.  1983)  including  those 
related  to  the  task  (e.g.,  recall,  recognition), 
materials  (e.g..  physical  structure,  organiza¬ 
tional  structure),  and  subjects  (e.g..  abtliues. 
interests).  For  example,  Simon.  Dixon.  No¬ 
wak,  and  Hultsch  (1982)  found  middle-aged 
and  oldr.  c  \  to  be  disadvantaged  relative 
to  young  cdui.r  when  asked  for  incidental 
recall  of  a  i  ■  (lowing  performance  of  deep 
orienting  tas„i,  however,  they  performed 
equally  well  following  performance  of  a  shal¬ 
low  orienting  task  and  under  intentional  recall 
conditions.  Similarly,  in  the  case  of  material 
variables,  Dixon,  Hultsch.  Simon,  and  von 
Eye  (in  press)  found  that  age-related  differ¬ 
ences  in  the  discovery  and  use  of  the  organi¬ 
zational  structure  of  texts  depend,  in  part, 
on  the  number  of  concepts  introduced  iD  the 
text 

Although  task  and  material  variables  play 
an  important  role  in  accounting  for  adult 
age-related  performance  diffeicnccs  in  text 
processing,  a  major  portion  of  the  variance 
may  be  mediated  by  subject  variables  For 
instance,  it  is  reasonable  to  expect  that  ind:- 
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\ndual  differences  in  education  and  verbal 
ability  predict  performance  differences  in  text 
recall  and  that  such  individual  difference 
variables  may  be  related  to  the  presence  or 
absence  of  age  differences  in  text  recall  per¬ 
formance  (Meyer  4  Rice,  1983;  Taub,  1979). 
In  this  context,  it  may  be  noted  that  studies 
reporting  age  differences  in  text  recall  have 
generally  tested  subjects  with  relatively  low 
levels  of  education  and  verbal  ability  (i.e„ 
high  school  graduates),  whereas  studies  re¬ 
porting  little  or  no  age  differences  have  gen¬ 
erally  tested  subjects  with  relatively  high  levels 
of  education  and  verbal  ability  (i.e.,  college 
graduates). 

In  a  recent  analysis,  Meyer  and  Rice  (1983) 
examined  text  recall  for  four  types  of  edu¬ 
cation/verbal  ability  subsamples  drawn  from 
a  large  sample  of  over  300  adults  who  had 
read  and  recalled  two  texts.  Their  analyses 
ciearly  indicated  age  differences  favoring 
younger  adults  in  populations  with  below 
average  or  average  verbal  ability  and  little 
education  beyond  high  school.  However,  they 
did  not  find  unequivocal  evidence  of  age 
differences  in  subjects  with  higher  levels  of 
verbal  ability  and  education.  On  the  one 
hand,  comparison  of  randomly  selected 
younger  adults  and  high-verbal  older  adults 
showed  no  significant  age-related  differences 
in  performance.  On  the  other  hand,  compar¬ 
ison  of  high-verbal  younger  adults  and  high- 
verbal  older  adults  revealed  age-related  per¬ 
formance  differences  in  favor  of  the  young. 

Similarly,  Dixon  et  al.  (in  press)  found  that 
verbal  ability  appears  to  mediate  age-related 
differences  in  the  discovery  and  use  of  the 
organization  of  texts.  Specifically,  in  the  case 
of  adults  with  relatively  low  levels  of  verbal 
ability,  age-related  differences  in  recall  were 
greatest  for  the  main  ideas  of  the  text.  Younger 
and  middle-aged  groups  did  not  differ  signif¬ 
icantly  in  recall  of  the  details  of  the  texts. 
However,  both  younger  groups  recalled  sig¬ 
nificantly  more  details  than  the  older  adults. 
Thus,  low-verbal  older  adults  showed  a  deficit 
in  recall  of  both  the  main  ideas  and  the 
details  of  the  text,  although  the  size  of  the 
deficit  was  greater  at  the  level  of  main  ideas 
than  at  the  level  of  details.  In  contrast,  in  the 
case  of  adults  with  high  levcis  of  verbal 
ability,  age  difference  in  recall  were  greatest 


for  the  details  of  the  texts.  There  were  no 
significant  difference  among  the  three  age 
groups  in  the  recall  of  the  main  ideas. 

Although  results  like  those  of  Meyer  and 
Rice  (1983)  and  Dixon  et  al.  (in  press)  suggest 
that  age  difference  in  text  recall  interact 
with  level  of  verbal  ability,  there  are  limita¬ 
tions  to  inference  drawn  from  extreme 
groups  deigns  in  which  subjects  are  grouped 
according  to  extreme  score  (e.g.,  upper  vs. 
lower  quartile)  on  a  continuous  variable.  Age- 
related  selection  in  the  population  makes  it 
difficult  to  equate  age/cohort  groups  parti¬ 
tioned  on  variable  such  as  educational  at¬ 
tainment  and  verbal  ability  (Krauss,  1980; 
Meyer  &  Rice,  1983).  At  a  given  level  of 
education,  a  sample  of  older  adults  is  probably 
more  highly  selected  than  a  sample  of  younger 
adults  because  of  cohort-related  differences 
in  educational  attainment.  Similarly,  at  a 
given  level  of  verbal  ability,  a  sample  of  older 
adults  is  probably  less  highly  selected  than  a 
sample  of  younger  adults  because  of  age- 
related  changes  in  vocabulary.  In  any  case,  it 
is  virtually  impossible  to  disentangle  selection 
confounds  from  age  differences  produced  by 
the  aging  process,  even  though  the  latter 
source  of  variance  is  obviously  the  one  of 
interest 

There  are  other  potential  problems  with 
the  extreme-groups  approach.  The  extreme- 
groups  design  ignores  strength  of  prediction 
in  the  inneT  quartiles  of  the  variables  distri¬ 
butions.  One  could  conclude  that  an  Age  x 
Ability  interaction  in  an  extreme-groups  de¬ 
sign  indicated  progressively  smaller  age  dif¬ 
ferences  with  increasing  ability  levels,  when 
in  fact  the  age  differences  were  consistent  at 
all  but  the  very  highest  levels  of  ability.  A 
potential  overgeneralization  of  extreme-groups 
interactions  with  age  can  only  be  avoided  by 
examining  the  interaction  across  the  full  range 
of  the  ability  distribution.  An  additional 
problem  is  that  group  assignment  to  extreme 
groups  on  the  basis  of  scores  on  a  single 
fallible  variable  may  cause  measurement  error 
to  have  an  unacceptedly  high  influence  on 
the  group  assignment.  Finally,  other  intellec¬ 
tual  abilities  and  individual  differences  vari¬ 
ables  may  mediate  age  differences  in  text 
processing.  A  comparison  of  groups  differing 
on  a  single  ability,  however  well  measured, 
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cannot  address  the  determination  of  individ¬ 
ual  differences  in  text  processing  by  a  well- 
defined  domain  of  abilities. 

The  present  study  was  designed  to  examine 
relationships  between  text  recall  performance, 
age,  and  a  selected  set  of  psychometric  intel¬ 
lectual  abilities  using  a  multivariate  correla¬ 
tional  approach.  More  specifically,  we  sought 
to  relate  text  recall  performance  at  three 
delay  intervals  to  a  set  of  primary  mental 
ability  factors  of  intelligence  (selected  on  the 
basis  of  their  potential  theoretical  relevance 
to  text  processing):  Induction,  Memory  Span, 
Associative  Memory.  Associational  Fluency, 
Ideational  Fluency,  and  Verbal  Comprehen¬ 
sion. 

The  present  analysis  has  two  major  foci. 
First,  we  wished  to  determine  whether  there 
is  an  interaction  between  multiple  intellectual 
abilities  and  age  in  determining  individual 
differences  in  text  recall  performance,  thus 
extending  the  logic  of  the  previous  extreme- 
groups  studies  to  a  regression  analysis  of  the 
interactive  relationship.  Our  analysis  addresses 
the  potential  deficiencies  in  the  extreme- 
groups  paradigm  by  (a)  examining  the  inter¬ 
action  at  the  level  of  intellectual  ability  factors 
rather  than  single  ability  variables  and  (b) 
producing  product  variable  interaction  terms 
that  examme  the  interaction  of  ability  and 
age  across  the  range  of  the  continuous  distri¬ 
butions  of  abilities  rather  than  in  extreme 
groups.  The  regression  analysis  thus  allows 
us  to  determine  whether  age  differences  in 
text  recall  are  statistically  independent  of  age 
differences  in  intellectual  abilities,  while  an¬ 
alyzing  whether  any  statistically  independent 
age  differences  are  qualified  by  the  existence 
of  ability 'age  interactions.  Based  on  previous 
studies,  we  predicted  that  there  would  be 
Age  x  Ability  interactions  in  text  memory 
performance,  with  smaller  age  differences  at 
higher  ability  levels. 

The  second  focus  of  the  study  involved  an 
analysis  of  individual  differences  in  text  recall/ 
intellectual  ability  relationships  within  each 
age  group.  We  predicted  that  the  patterns  of 
text  recall-intelligence  correlations  would  vary 
with  age  and  the  length  of  the  recall  delay 
interval.  If  true,  these  predictions  would  in¬ 
dicate  an  important  qualification  to  any  in¬ 
terpretation  of  Age  x  Ability  interactions  in 


text  performance,  because  different  abilities 
might  be  important  for  performance  at  dif¬ 
ferent  stages  of  the  life  span. 

Method 

Subjects 

The  subjects  wore  (50  community-dwelling  white 
women  from  a  small  eitv  in  central  Pennsylvania  They 
were  recruited  through  the  Altoona  campus  of  The 
Pennsylvania  State  University  and  local  organizations, 
such  as  churches  and  senior  citizen  centers  The  subjects 
were  paid  Jli.OO  for  their  participation  in  the  study 

The  sample  was  droded  imo  three  age  groups  of  5C 
individuals.  The  youngest  group  ranged  in  age  from  2  j 
to  39  yean  IM  «  32.03),  the  middle  group  ranged  >r  age 
from  40  to  58  yean  (.if  ■  49.48).  and  the  oldest  group 
rangai  in  age  from  60  to  71  yean  (M  *  68  9ti  The 
three  age  groups  difoed  ngmhcaoLfy  in  yean  of  educaocr. 
(young:  M  -  13.62;  middle-aged:  SI  «  12. "8.  old  .If  - 
10.98).  F\ 2.  147)  -  14. *7,  p  <  .001  In  orde-  to  examine 
these  differences  further,  the  sample  was  broker,  down 
into  semi-decade  age  groups,  and  the  median  education* 
attain  mem  of  then*  age  groups  was  compared  to  tka; 
reported  for  these  cohorts  by  the  L'_S  Bureau  of  the 
Census  (1977).  These  coropansoos  suggested  that  the 
subjects  of  the  promt  study  apgamroaied  the  educational 
attainment  charaoensncs  of  their  respective  cohorts, 
with  the  exception  of  the  youngest  (20-24  yean)  and 
oldest  (75-*  yean)  groups,  which  had  approximately  3 
more  yean  of  education  than  expected 

The  subjects  ware  also  ashed  to  provide  a  subjective 
evaluation  of  their  own  health,  nsioo.  and  bearing  com¬ 
pared  to  oiher  people  their  ^c.  At  least  90“X  of  the 
subjects  m  all  three  age  groups  rated  themselves  as 
moderately  good,  good,  or  very  good  on  these  character¬ 
istics. 

Ability  Measures 

The  ability  measures  mnseanl  of  a  batten  of  12  tests 
selected  to  represent  six  primal y  menui  ability  factors. 
Induction.  Memory  Span.  Aaooairve  Memory.  Assocta- 
tiooaJ  Fluency,  Ideational  Fluency,  and  MerbaJ  Compre- 
benaon  fEkstrora.  French.  Harman.  A  Dermec.  1 976 1 
The  factors,  representing  several  aspects  of  verbal  intel¬ 
ligence  and  memory,  were  chosen  on  the  basis  of  then 
potential  relevance  to  memory  performance  (Horn  A 
Donald  son,  1980;  Hultsch.  Nesseiroode.  A  Piemans. 
1976).  Two  specific  tests  repruumnve  of  each  primary 
mental  ability  were  selected  from  published  batteries, 
yielding  a  battery  of  12  teas,  shown  in  Table  I.  In  some 
instances,  the  format  of  the  teas  was  modified  slightly  lt 
order  to  dartfy  the  man* tana  and  simplify  the  response 
modes  for  older  adult  subjects.  None  of  the  modificanot  s 
was  considered  extensive  enough  to  affect  the  measurement 
validity  of  the  tests. 

Text  Materials 

The  text  materials  rnnhorrl  of  four  narratives,  each 
approximately  500  words  in  v-nf h  The  narratives  wore 
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Table  1 

Intellectual  Ability  Measurement  Battery 


Primary  mental  ability 

Representative  marker  test 

Source 

Induction 

Letter  Sets  Tesr* 

Ekstrom.  French,  Hannan,  4  Dermen  (1976) 

Induction 

Letter  Series  Test 

Thurstone  (1962) 

Memorv  Span 

Visual  Numbe  Span* 

Ekstrom  et  al.  ( 1 976) 

Memory  Spas 

Auditory  Number  Span  Backwards* 

After  Ekstrom  et  al.  (1?’6) 

Associative  Memory 

Object  Number  Test* 

Ekstrom  et  al.  (1976) 

Associative  Memory 

Memory  for  Words  U 

Kelley  (1964) 

Association!!  Fluency 

Controlled  Associations  Test* 

Ekstrom  et  al.  (1976) 

Association^  Fluency 

Figures  of  Speech  Test* 

Ekstrom  et  al.  (1976) 

Ideational  Fluency 

Topics  Test* 

Ekstrom  et  al.  (1976) 

Ideational  Fluency 

Theme  Test* 

Ekstrom  et  al.  (1976) 

Verbal  Comprehension 

Vocabulary  I* 

Ekstrom  et  al.  (1976) 

Verbal  Comprehension 

Advanced  Vocabulary* 

Ekstrom  et  al.  (1976) 

*  Pin  I  only. 


abstracted  from  nupniv  imcJes,  and  such  dealt  with  a 
life  event  experienced  by  a  central  female  charaoet  The 
events  included  born*  a  firs  child,  recovering  Croc  as 
injury  sustained  in  an  automobile  accident,  returning  to 
school  and  beginning  a  or»  career,  and  coping  with  a 
family  financial  problem. 

Kinucb's  (19741  system  was  used  to  represent  the 
meaning  of  the  texts.  Within  this  system,  the  meaning 
of  a  text  is  represented  by  a  structured  set  of  propositions 
known  as  a  text  base.  A  proposition  consists  of  a  predicate 
and  one  or  more  arguments.  Predicates  tend  to  be  verb 
forms  and  specify  a  relation  among  the  arguments. 
Arguments  art  word  concepts  or  other  propositions 
themsebes.  A  propositional  analysis  of  each  text  was 
done  according  to  the  criteria  developed  by  Kjntsch 
C9'4)  and  elaborated  by  Turner  and  Green  (1978).  Each 
of  the  texts  contained  from  221  to  248  propositions. 

Procedures 

The  tex:  recall  and  ability  tasks  were  administered  to 
small  groups  of  3  to  10  individuals  over  three  occasions. 
During  the  first  session,  the  subjects  ««re  asked  to  read 
and  remember  the  texts  and  to  complete  four  of  the 
aoi'uty  measures.  The  texts  were  presented  in  typewritten 
booklets  The  order  of  the  stones  was  partially  counter¬ 
balanced,  with  each  text  occumng  once  in  each  ordinal 
position.  Prestige  Pica  10-pitch  type  was  used  in  order 
to  minimize  possible  senary  difficulties.  The  subjects 
were  instructed  to  Rad  each  of  the  four  texts  a  their 
oum  pace.  Recall  waa  tested  after  each  text  with  subjects 
writing  their  recall  on  fined  pages  in  the  booklet.  It  was 
emphasized  to  the  subjects  that  verbatim  recall  was  not 
required.  Following  the  text  recall  task,  all  subjects 
completed  the  Vocabulary  l,  Controlled  Associations. 
Letter  Sets,  and  Advanced  Vocabulary  testa.  The  less 
were  administered  in  invariant  order  and  under  the  time 
limits  specified  in  the  original  source. 

One  week  following  the  original  session,  the  subjects 
were  asked  to  remember  the  texts  again  and  to  complete 
the  remaining  eight  ability  measures.  For  the  wwwt 
recall  test,  the  title  of  each  oarratrve  was  printed  on  i 


lined  page  of  the  recall  booklet,  and  the  subjects  were 
instructed  to  write  ihetr  recall  on  the  page  Again,  it  was 
emphasized  to  the  subjects  that  verbatim  recall  was  not 
required.  Following  the  text  recall  task,  all  subjects 
completed  the  Theme,  Letter  Senes.  Memory  for  v-ords. 
Visual  Number  Span  Forward,  Figures  of  Speech.  Object 
Numbec  and  Auditory  Number  Span  Backwards  tests. 
The  tests  were  administered  in  invanant  ordeT  and  under 
the  nme  limits  specified  in  the  onginal  source 
Finally,  4  weeks  following  the  ongtnal  session  n't: 
subjects  were  asked  to  remember  the  texts  a  third  '-me 
The  procedures  followed  for  this  final  reca^  ter.  were  the 
same  as  those  used  for  the  second  recall  test. 


Recall  Protocol  Scoring 

Each  recall  protocol  was  checked  against  the  preposi¬ 
tions  of  the  original  text  base  in  order  to  determine 
whether  each  proposition  was  expressed  id  the  protocol 
In  the  scoring  system  used,  a  proposition  was  scored  as 
correctly  recalled  if  it  contained  the  "gist"  of  the  propo¬ 
sition's  meaning  (Turner  4  Green.  1978),  Thus,  overspec¬ 
ified  or  generalized  relations  and  arguments  were  scored 
as  correct  (if  substantively  correct).  If  the  subject  made 
an  error  in  one  proposition  and  then  repeated  the  error 
in  a  subordinate  proposition.  the  subordinate  propos-tioc 
was  scored  as  correct  to  xvoid  counting  errors  more  thin 
ooce. 

A  separate  study  was  conducted  to  determine  the 
interriter  reliability  of  this  scoring  system.  Twelve  pro¬ 
tocols  were  randomly  selected  for  each  of  the  four  stones 
and  independently  scored  by  two  scorers.  With  an  average 
of  230  propositions  per  story,  there  were  approximately 
2.760  scoring  decisions  made  by  each  scorer.  There  was 
93.9%  agreement  between  the  two  scorers  on  whether  a 
preposition  should  or  should  not  be  scored  as  correctly 
recalled.  During  the  course  of  the  scoring.  one  of  the 
scorers  had  to  be  replaced.  Accordingly,  interrater  reli¬ 
ability  »as  ‘'“xwt  a  second  time  for  the  new  pair  of 
scorers  using  the  same  procedures  as  before.  The  analysis 
revealed  93.8%  agreement  between  the  two  scorers. 
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LlSREL  Methodology 

The  analyses  reported  in  this  paper  art  based  on  the 
factor  analysis  measurement  model  in  LlSREL  (Joreskog 
&  Sorbom,  1979).  In  modified  LlSREL  notation,  the 
measurement  model  expresses  the  covariance  matrix  of 
the  observed  variables  in  the  populations.  I,  as 

I-A+A  +  0,  (1) 

where  A  is  a  p  X  m  matrix  of  factor  loadings.  ♦  is  the 
covariance  matrix  of  the  factors,  and  0  is  the  covariance 
matrix  of  the  residua)  (unique)  components. 

It  is  necessary  to  specify  a  model  that  has  a  unique 
solution  for  the  parameters  by  placing  a  sufficient  number 
of  restrictions  on  the  equations  in  (I)  to  idemif>  the 
remaining  unknowns.  Restrictions  are  specified  by  either 
fixing  parameters  to  a  known  value  a  prion  (e.g.,  requinng 
that  a  vanabie  is  unrelated  to  a  factor  by  fixing  its 
regression  in  A  to  0).  or  constraining  a  set  of  two  or 
more  parameters  to  be  equal  One  of  the  advantages  of 
usrel's  equality  constraints  is  that  parameters  may  be 
constrained  equal  between  different  age  groups  Oct- 
identified  models  provide  a  likelihood  rano  xJ  test  statistic 
Ctat  may  be  used  to  test  the  goodness  of  fit  of  the  model. 
Differences  in  x!  between  two  alternative  models  are 
particularly  useful  for  hypothesis  testing.  For  example. 
Lfie  difference  in  x:  berweer.  a  mode!  forcing  all  text 
memory  correlations  with  intelhg- nee  factors  to  be  zero, 
and  a  model  freely  estimating  the  correlations,  is  a 
likelihood  ratio  test  of  the  null  hypothesis  that  the 
correlations  are  in  fact  zero  in  the  populauon. 

Ir.  the  present  data  analysts  the  small  sample  sizes  of 
50  subjects  per  age  group  require  some  caution  in  the 
use  of  LlSREL  and  x 2  testing.  First  the  assumption  that 
the  sample  covariance  matrices  provide  asymptouc  esti¬ 
mates  of  the  populauon  covariance  matrices  may  be 
violated  The  consequences  of  violaung  this  assumpuon 
include  the  possibility  that  modei  parameters  may  be 
somewhat  sample  specific  and  may  not  be  replicable  in 
larger,  independent  samples.  Thus,  the  analyses  reported 
here  should  be  considered  exploratory  attempts  at  model 
building,  which  must  be  repiicated  and  extended  on  new 
samples.  Second,  the  small  sample  sizes  means  that  the 
likelihood  rauo  tests  have  relauvely  low  stabsucal  power. 
The  greater  possibility  of  Type  II  errors  creates  a  special 
prcbiem  for  USREL  models — it  ts  possible  to  accept  a  set 
of  restrictions  that  are  id  fan  untenable  in  the  population, 
and  would  be  shown  to  be  so  had  a  Larger  sample  size 
been  employed.  The  analyses  reported  here  were  con¬ 
ducted  with  careful  attention  to  this  issue 

In  muluple  groups  analysis,  it  is  necessary  to  estimate 
factor  models  using  covariance  metric  and  sample  co¬ 
variance  matrices  rather  than  to  analyze  separately  stan¬ 
dardize  correlation  matrices.  Standardization  could  ob¬ 
scure  invariant  relationships  because  of  group  differences 
in  observed  variances  (see  Cunningham,  1978;  Joreskog. 
1971).  The  analyses  reported  here  were  all  conducted  in 
covariance  metric,  and  UJREL's  maximum  likelihood 
parameter  estimates  and  their  standard  errors  are  therefore 
in  unstandardized  form.  Because  standardized  statistics 
are  easier  to  interpret,  we  also  report  parameter  estimates 
that  have  been  rescaled  to  standardized  metric.1 


Results 

The  data  analysis  consisted  of  two  pans 
The  first  part  examined  differences  in  text 
recall  as  a  function  of  age,  delay  interval,  and 
story.  The  second  part  assessed  relations  be¬ 
tween  text  recall  performance  and  the  ability 
variables. 

Age  Differences  in  Text  Recall 

In  order  to  examine  differences  in  gist 
recall,  a  3  (age)  X  3  (delay  interval)  x  4  (story) 
mixed-model  analysis  of  variance  (an- 
Ova)  with  repeated  measures  on  the  last  two 
factors  was  performed  on  the  percentage  of 
correctly  recalled  propositions/  The  analysis 
revealed  significant  main  effects  of  age.  F{2. 
147)  =  33.89,  p  <  .001,  and  delay  interval. 
F[ 2,  294)  =  323.38.  p  <  .001.  Neuman-Keuls 
analyses  conducted  at  p  <  .05  revealed  that 
the  younger  adults  (M  =  17.33)  recalled  a 
significantly  greater  percentage  of  propositions 
than  the  older  adults  {M  =  6.20).  The  two 
younger  groups  and  the  two  older  groups  did 
not  differ  significantly.  Neuman-Keuls  anal¬ 
yses  also  revealed  that  the  participants  recalled 
a  significantly  greater  percentage  of  proposi¬ 
tions  at  immediate  recall  (M  =  18.49)  that-, 
following  delays  of  1  (A/  *  10.34)  or  4  weeks 
(M  =  8.69).  There  was  no  significant  differ¬ 
ence  between  the  1-  or  4-week  intervals. 

The  analysis  also  revealed  a  significant 
interaction  of  age  with  delay  interval.  F[ 4, 
294)  =  4.27,  pc.Ol,  shown  in  Figure  1. 
Neuman-Keuls  analyses  indicated  that  at  all 
three  recall  tests,  the  younger  and  middle- 
aged  adults  recalled  a  significantly  greater 
percentage  of  propositions  than  the  older 
adults.  The  two  younger  groups  did  not  differ 
significantly.  As  shown  in  Figure  1,  however, 
the  differences  between  the  age  groups  are 
somewhat  greater  at  the  immediate  recall  test 


1  We  do  not  report  the  full  model  specification  or  the 
maximum  likelihood  estimates  for  aD  models.  Readers 
interested  in  a  more  detailed  description  of  the  speofi- 
cation  and  cables  of  maximum  Likelihood  estimates  for 
all  models  should  write  to  C.  Heruog 
1  Mixed-model  F  tests  may  be  positively  biased  if  the 
circularity  assumption  is  violated;  however,  muitiwute 
significance  tests  for  repeated  measures  effects  a (reed 
with  the  mixed  model  tests  is  all  cases. 
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than  they  are  at  the  delayed  recall  tests. 
Nevertheless,  this  interaction  may  actually 
reflea  the  fact  that  the  older  adults  are 
exhibiting  a  “floor  effect"  at  the  later  delay 
intervals  Thus,  given  the  similarity  of  the 
curves  in  Figure  1,  the  most  reasonable  in¬ 
terpretation  is  that  the  rate  of  forgetting  is 
similar  for  all  three  age  groups. 

Finally,  the  analysis  also  revealed  a  signif¬ 
icant  main  effect  of  story,  441)  *  47.94, 
p  <  .001,  and  an  interaction  of  ’his  variable 
with  age,  F\ 6,  44!)  =  3.53,  p  <  .01.  These 
effects  were  a  function  of  the  faa  that  the 
younger  and  middle-aged  adults  recalled  one 
of  the  stories  berter  than  the  others.  Because 
the  four  stories  were  not  selected  along  any 
a  priori  dimensions,  these  effects  were  not 
interpreted  further. 

In:e!!:gence-Text  Recall  Relationships. 

Age  x  Ability  Interactions 

Our  approach  to  testin;  Age  X  Ability  in¬ 
teractions  in  text  recall  performance  involves 


(a)  the  use  of  factor  analy^i.,  to  define  ability 
factors;  (b)  the  computation  of  ability  fanor 
scores  using  the  factor  regression  method; 
and  (c)  the  joint  regression  analysis  of  text 
recall  performance  on  ability  factors,  age, 
and  Ability  X  Age  interaction  terms. 

Analysis  of  intellectual  variables.  We  first 
confirmed  the  expected  pattern  of  age  differ¬ 
ences  in  psychometric  intelligence  by  com¬ 
puting  a  multivariate  analysis  of  variance 
(manova)  on  the  age  factor  for  all  12  ability 
variables,  using  a  subsample  of  143  subjects 
with  complete  psychometric  data.  There  were 
significant  age  differences  in  intelligence,  ap¬ 
proximate  F{ 24,  258)  =  8.25,  p  <  .001.  Uni¬ 
variate  tests  (not  reported  here  in  the  interests 
of  brevity)  indicated  significant  age  differences 
on  all  subtests  except  Vocabulary  I,  with  the 
largest  age  differences  on  Letter  Series  and 
Letter  Sets.  Thus,  there  were  significant  age 
differences  in  ability  (consistent  in  partem 
and  magnitude  with  previous  reports  in  the 
literature,  e.g..  Horn  &  Donaldson.  1 980 1, 


Figure  !  Percentage  of  proposition!  recalled  as  a  function  of  age  and  delay  interval  averaged  oct  stones. 
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which  could  contribute  to  the  observed  age 
differences  in  text  memory  performance. 

in  order  to  address  the  intelligence-text 
recall  relationship,  we  conducted  a  confir¬ 
matory  factor  analysis  on  the  12  psychometric 
subtests.  As  indicated  above,  the  intelligence 
subtests  were  originally  selected  in  order  to 
measure  the  six  primary  ability  factors  listed 
ir.  Table  1.  The  initial  confirmatory  factor 
analysis  specified  this  six-factor  model  with 
all  loadings  except  those  listed  in  Table  l 
fixed  to  zero  The  results  indicated  that  the 
primary  ability  factor  model  was  “overfit," 
with  a  small  x:  and  factor  correlations  uni¬ 
formly  high  (generally  in  the  .7  to  .9  range). 
These  results  were  problematic  for  any  at¬ 
tempt  to  correlate  text  memory  performance 
with  primary  ability  factors  in  order  to  iden- 
ti'y  ability -specific  differences  in  relations 
with  text  recall  performance,  because  each 
subtest  has  a  substantial  regression  on  a 
second-order  general  intelligence  factor  (g). 
If  age  groups  differ  in  the  magnitude  of 
relationship  of  primary  ability  factors  to  a 
second  order  g  factor,  we  could  detect  differ¬ 
ences  in  correlations  between  text  recall  and 
two  primary  ability  factors  (e.g..  Verbal  Com¬ 
prehension  vs.  Memory  Span)  even  when  the 
only  meaningful  relationship  was  between  g 
and  text  recall.  We  therefore  opted  for  a 
factor  analysis  model  that  directly  modeled^ 
as  one  of  the  factors  and  then  represented 
the  other  ability  factors  as  residual  or  group 
factors.  We  consider  this  model  to  be  a 
defensible  representation  of  the  factor  struc¬ 
ture  that  could  be  meaningfully  used  to  de¬ 
termine  whether  intelligence-text  recall  rela¬ 
tionships  were  a  function  of  g  or  more  specific 
factors  such  as  Associative  Memory,  Verbal 
Comprehension,  and  Associational  Fluency. 

We  proceeded  to  estimate  a  model  speci¬ 
fying  a  general  intelligence  factor  plus  four 
specific  factors:  Verbal  Comprehension,  Verbal 
Productive  Thinking,  Memory  Span,  and  As¬ 
sociative  Memory.1  Our  initial  results  forced 
several  modifications  of  this  model.  Although 
the  Memory  Span  and  Associative  Memory 
variables  have  been  conceptualized  as  loading 
on  the  Horn’s  Secondary  Acquisition  and 
Retrieval  factor  (Horn  <fc  Donaldson,  1980), 
we  did  not  find  a  Memory  Span  factor  inde¬ 
pendent  of  g  The  results  also  showed  that 
the  Theme  subtest  did  not  load  on  the 


Verbal  Productive  Thinking  factor.  Subse¬ 
quent  models  fixed  this  factor  loading  to  zero. 
The  fit  of  the  modified  four  factor  model  was 
excellent,  x:(44,  S  =  143)  =  49.97,  p  =  .25. 
F  =  .176,  indicating  the  model  was  a  plausible 
representation  of  the  factor  structure  in  the 
entire  sample. 

We  also  examined  the  issue  of  age-group 
differences  in  factor  structure.  If  different 
factor  models  were  required  to  account  for 
the  covariances  among  the  psychometric 
measures,  the  measurement  equivalence  of 
these  ability  measures  across  age  groups  would 
be  called  into  question.  An  important  impli¬ 
cation  of  a  lack  of  factorial  invariance  for 
the  present  analysis  would  be  that  the  rela¬ 
tionship  between  ability  factor  scores  and  text 
recall  performance  could  not  be  examined 
by  regression  analysis  on  the  entire  sample, 
because  the  relationship  of  measures  to  ability 
factors  would  vary  with  age. 

As  shown  by  Meredith  (1964),  group  selec¬ 
tion  from  a  population  for  which  a  common 
factor  model  holds  will  yield  an  invariant 
raw  score  (unstandardized)  factor  partem 
matrix,  but  unique  variances,  factor  variances, 
and  factor  covariances  may  vary  because  of 
selection  effects.  An  implication  of  Meredith's 
work  is  that  empirical  evidence  indicating  an 
invariant  raw  score  factor  pattern  matrix  ts 
consistent  with  a  simple  selection  model, 
which,  if  true,  would  justify  further  analysis 
of  ability-text  recall  relationships  based  on 
the  single  group  factor  solution. 

We  therefore  estimated  a  series  of  simul¬ 
taneous  thre.-group  models  specifying  the 
same  four  factors:  g ,  Verbal  Productive 
Thinking,  Verbal  Comprehension,  and  Asso¬ 
ciative  Memory,  testing  the  hypotheses  of 
berween-group  equivalence  in  0,  ♦,  and  A- 
Tbe  hypotheses  of  group  equivalence  in  ♦ 
and  0  were  not  rejected,  x204,  N  =  143)  = 
22,10,  .05  <  p  <  .10,  and  xJ(24,  =  143)  = 

32.14,  p  <  .10,  respectively.  However,  the 
absolute  x:  test  was  statistically  significant 


1  Inductwc  vs  not  estimated  as  a  (roup  boor  because 
of  its  dose  reiauocship  to  /  (Vernon.  1979);  VtetoJ 
Productive  thinking  is  Homs  second  order  betor  eoo- 
birnot  Aaocsaoonai  sad  Ideauanai  Fluency  (Horn  A 
Donaldson.  1980);  used  Verhal  Productive  thinking 
because  the  estimated  corre tattoo  of  the  r*c  primary 
ability  factors  exceeded  9  in  the  su- factor  model 
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Tabie  2 

Results  of  the  Ability  Factor  Analysis  in  the  Three  Age  Groups 


SubicB 

Factor  Pattern  Main  (A) 

Lr 

ique  Varan:: 
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VPT 

AM 

Young 

Miid'e 

C  : 
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cr 

O' 
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.19 
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Al 

.75 

cr 

0* 
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.19 

.  - 
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.43 

cr 

.62 

0* 

63 
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1  " 
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O' 
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• 
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19 
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0* 

.97 

95 
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0* 

V 

cr 
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.'4 

i 

Bacr»ard  Span 

4 1 

0* 

cr 

cr 
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"0 

"  . 

Object  Number 

42 

0* 

cr 

.37 

e: 

“ 

Memo*}  for  N^ords 

60 

0* 

cr 

•’0 

0* 

0* 

(  ‘ 
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8' 

0* 

cr 

0* 

2  <■ 

2  - 

2  ■- 

Ler.r-  Sets 

65 

cr 

T 

cr 

.59 

5* 

bote  g  -  pncx  lnirujpnce  factor.  VC  *  Verbal  Comprehetmcn  factor  VPT  =  V  erba.  Prac.r.':  T.“_  i  ; 
factor  .  A.M  *  Acoc-a'jvt  Memory  facto:  AU  dements  ;o  i_  and  €),  are  rescaled  tc  a  q-is:r.ardari_:rd  ::~r 
nr.-.:  using  the  approach  recommended  by  Joreskog  U9'l|_ 

*  Fuse  parameter 


for  the  model  wvdt  all  matrices  invariant 
over  groups.  ift200.  .V  =  143)  =  237.9C.  p  < 
.05  Given  the  small  sample  suss,  we  elected 
to  alio*  0  and  ♦  to  vary  freely  over  groups 
ir.  subsequent  models.  A  mode!  allowing  all 
far, or  loadings  to  vary  freely  over  groups  did 
tot  significantly  improve  the  fit  of  ibe  model. 

Table  2  reports  the  scaled  factor  loadings, 
factor  variances  and  eovanances.  aod  unique 
variances  for  the  mode!  requiring  A  to  be 
invariant  over  groups  but  allowing  group 
differences  in  0  and  The  model  provided 
an  adequate  fit  to  the  data,  x:Ufi4, 
.V  =  M3)  =  184.66.  p  >  .10.  As  can  be  seen 
from  Table  2.  e  was  marked  by  high  loadings 


for  the  Induction  subtests.  Lerter  Sets  arc 
Lentr  Seres,  but  there  were  significant  ica:- 
mgs  on  al!  subtests  Relatively  hid-  leading 
were  found  for  the  Verbal  Comprehend.:- 
subtests  and  for  Memory  for  v,crds  on  f  as 
well  The  Verbal  Comprehension  factor 
well  defined  by  both  subtests,  whereas  Verr^ 
Productive  Thinking  was  defined  most 
dommately  by  Controlled  Associations 
Associative  Memory  was  weighted  tcwarc 
Memory  for  Words. 

Perhaps  the  most  interesting  results  were 
found  in  the  group  differences  in  'i'  Table  3 
shows  that,  although  the  correlations  betw.ee- 
the  Verbal  Productive  Thinking.  Verbal  Com- 


Table  3 

Results  of  Factor  Analysts  In  the  Three  Age  Groups  Facto'  Co\ariance  A  iat'tcc  '4'/ 


Young  Middle  Old 
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O' 
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J7 
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O' 
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68 

46 

0* 
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61 

Cr' 

VR¬ 

O' 

1  77 
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O' 

6  97 

12.96 

A3 

O' 
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7  ;  6 

*  “ 

AM 

O' 

1  46 

-.64 

no 

O' 

1.73 

2.56 

181 

cr 

1  99 

1  33 

4  * 

Aore  g  ■  jEoenJ  imeflifenee  factor,  VC  •  Verbal  Comprehension  factor.  VPT  •  Verbal  Productive  Tnir.i.-. 
factor  AM  -  Associative  Memory  floor  All  dements  in  A,  and  6,  art  rescaled  ic  a  quu:standard.:ed  ;o— e.a; 
metric  using  the  approach  recommended  by  Joresiog  (1971).  Correlaiiors  are  abo'e  the  dagor.i. 

*  Fued  parameter 
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prehension  and  Associative  Memory  factors 
were  relatively  modest  in  magnitude  in  the 
young  group  they  were  generally  larger  in 
the  middle-age  and  older  groups. 

For  both  the  covariances  and  correlations, 
the  largest  differences  seemed  to  be  between 
the  young  group  and  the  two  older  groups. 
In  genera],  the  middle-aged  group  was  more 
variable  in  ability,  although  the  old  group 
was  the  mosi  variable  in  Verbal  Comprehen¬ 
sion  and  the  young  group  had  the  largest 
variance  in  Associative  Memory.  The  ten¬ 
dency  for  the  older  group  to  have  higher 
correlations  among  abilities  than  the  middle- 
aged  group  is  qualified  by  the  group  differ¬ 
ences  in  variances:  The  covariances  among 
abilities  differ  little  between  the  two  older 
groups. 

Because  the  results  from  the  multiple  group 
analysis  were  consistent  with  the  selection 
hypothesis  discussed  by  Meredith  (1964), 
pooling  the  data  over  age  groups  for  further 
analysis  was  justified.  We  used  LiSREL's  factor 
score  regression  matrix  to  estimate  ability 
factor  scores  for  the  entire  sample.  Table  4 
gives  the  calculated  correlations  among  the 
factor  score  variables  that  agreed  relatively 
well  with  the  LISREL  maximum  likelihood 
estimates  of  the  factor  correlations.  Note, 
however,  that  the  substantial  correlations  be¬ 
tween  the  specific  ability  factor  scores,  espe¬ 
cially  between  Verbal  Comprehension  and 
Verbal  Productive  Thinking,  create  the  pos¬ 
sibility  of  suppression  effects  in  the  regression 
analysis. 

Intelligence  X  Age  interactions  The  re¬ 
gression  analysis  of  age  and  ability  variables 
is  equivalent  to  the  analysis  of  covariance 
(a.ncova)  approach,  but  with  the  interaction 
between  the  independent  variable  (age)  and 
the  covanates  (ability  factor  scores)  explicitly 
represented  in  the  design.  Tests  of  such  inter¬ 
action  terms  are  often  treated  only  as  tests  of 
ancova  assumptions.  However,  the  ancova 
analogy  is  misleading  here  because  the  inter¬ 
actions  provide  the  critical  information  re¬ 
garding  the  consistency  of  age  differences  in 
text  memory  across  levels  of  ability  and  are 
therefore  of  substantive  interest  in  their  own 
right  (Cohen  &  Cohen,  1975). 

The  interaction  variables  were  calculated 
by  multiplication  of  two  onhonormal  con¬ 
trasts  across  the  age  factor  with  the  factor 


Table  4 


Factor  Correlations  for  Intelligence  Variables 
Single  Group  Analysis 


Factor 

t 

vc 

VPT 

AM 

t 

r 

.OJJ 

.083 

.(N-i 

vc 

0 

1 

.739 

.520 

VPT 

0 

-398  (.115)* 

1 

.31’ 

AM 

0 

.489  (.127) 

.28:  (.137) 

1 

Sole  g  *  |eneraJ  intelligence  factor.  VC  *  Verbal 
Comprehension  factor;  VPT  -  Verbal  Productive 
Thinking  factor;  AM  »  Associative  Memorv  facto: 
LtSJt El  estimates  and  suadaro  errors  for  the  correlations 
among  abilitv  factors  art  given  below  the  diagonal,  and 
correlations  among  estimated  factor  scores  are  gr.tr 
above  the  diagonal 

"All  zeroes  and  ooes  are  fixed  parameters  in  ijsre. 
model 

*  Standard  errors  are  in  parentheses 


score  variables  after  these  latter  variables  had 
been  transformed  to  z  scores.  The  two  con¬ 
trasts  selected  compared  (a)  middle-aged  with 
old  subjects  and  (b)  young  subjects  against 
the  combined  middJe-aged  with  old  age 
groups.  The  regression  equations  therefore 
included  14  independent  variables  organized 
in  three  sets:  (a)  the  four  ability  factors,  (b) 
two  age  contrasts,  and  (c)  eight  interaction 
variables  representing  the  products  of  these 
first  two  sets  of  variables.  A  separate  regres¬ 
sion  analysis  was  conducted  for  each  of  the 
three  delay  levels  (immediate,  I  week,  and  4 
weeks).  Before  examining  partial  regression 
coefficients  we  calculated  hierarchical  signif¬ 
icance  tests  of  the  increment  R:  for  three  sets 
of  independent  variables;  the  four  abiliry 
factors,  the  two  age  contrasts,  and  the  eight 
Ability  X  Age  interaction  variables. 

The  results  of  the  hierarchical  significance 
tests  are  given  in  Table  5.  For  each  delay 
level,  the  overall  R2  was  highly  reliable,  wuh 
the  adjusted  R2  of  greater  than  .5  for  each 
equation.  Thus,  a  large  proportion  of  text 
recall  variance  was  accounted .  for  by  the 
model.  The  increments  to  R2  for  the  ability 
factors  were  large  and  significant  for  all  three 
delay  conditions,  accounting  for  greater  than 
80%  of  the  total  R2  in  each  rase.  The  overall 
test  of  age  differences  was  also  significant  ai 
each  delay  condition,  with  R2  smallest  at 
immediate  recall.  Adjusted  for  shrinkage,  age 
accounted  for  between  3%  and  4%  of  the 
variance  actoss  all  delay  conditions.  This  is 
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Table  5 

Summary  of  R1  and  Statistical  Tests  for  Regression  Analysis  With  Age  and  Intelligence  Factors 


Depended!  variables 
(delay  conditions) 

Independent  variables 

(Set)  R* 

A1' 

F 

if 

F 

Immediate 

IQ 

.452 

.436 

.452 

.436 

28  42 

(4.  138. 

<  oc 

Age 

.486 

.463 

.034 

.027 

4.46 

(2.  136i 

<0  5 

IQ  x  Age 

.557 

.508 

.071 

.045 

2.56 

(8,  128) 

<.e; 

(Age  aJonel' 

.255 

.244 

— 

— 

— 

— 

— 

1  week 

IQ 

.478 

.462 

.478 

.462 

31.53 

(4.  138) 

<.oc. 

Age 

.523 

.502 

J345 

.040 

6  43 

(2.  1 36 1 

<0: 

IQ  x  Age 

.570 

.523 

.047 

.021 

175 

(8.  128' 

>  C5 

(Age  alone] 

.316 

.306 

— 

— 

— 

— 

— 

4  weeks 

IQ 

.475 

.460 

4’J 

.460 

31.2! 

(4.  138. 

<oc. 

Age 

.516 

.494 

.04! 

.034 

5. '3 

(2.  136 

<  c: 

IQ  x  Age 

J70 

.523 

.054 

.029 

2.01 

(8  128 

>05 

(Age  alonei 

.322 

JI2 

— 

— 

— 

— 

— 

Fo:r  IQ  ■  intelligence  factor  scores.  Regression  statistics  art  for  hierarchical  regression  entering  three  seu  c' 
independem  variables.  four  mieiligenoe  factors.  rwo  age  contrasts.  aod  eighi  interaction  vinabiei 

*  R:  adjusted  for  shrinkage 

“Change  to  unadjusted  R~  from  previous  set 

*  Change  in  shrinkage  adjusted  F:  from  previous  set. 

‘  P~  for  two  age  contrasts  as  only  independem  variables  (i_e„  ignoring  intelligence) 


clearly  a  major  reduction  in  the  prediction 
of  test  memory  performance  by  age.  because 
it  accounted  for  between  2 O'*  and  30%  of 
the  variance  when  entered  without  the  ability 
variables  (sec  Table  5).  Nevertheless,  the  anal¬ 
ysis  indicates  there  are  age  differences  in  text 
memory  performance  that  are  independent 
of  intellectual  ability. 

The  Ability  X  Age  interactions  were  not 
consistently  reliable,  exceeding  a  5%  alpha 
level  only  for  the  immediate  recall  condition 
(although  there  were  10%  level  trends  for 
both  longer  delay  intervals).  Thus,  all  three 
types  of  variables  provided  independent  con¬ 
tributions  to  thv  total  R:,  with  the  largest 
amour*  r-f  variau-e  accounted  for  by  the 
abili’y-  -a  memory  relationships  measured 
at  ■  y..  of  the  ability  factors;  age  differ¬ 
ence,,  ju  vext  memory  covary  highly  with  age 
difference*  *'  multiple  intellectual  abilities; 
however,  differences  in  text  memory  cao- 
n«  be  eliminated  by  partialing  ability  differ¬ 
ences;  and  there  may  be  Age  x  Ability  inter¬ 
actions  in  text  memory  performance  that 
qualify  the  existence  of  age  main  effects. 

Table  6  reports  the  individual  standardized 
regression  parameter  estimates  and  their 
standard  errors,  which  may  be  used  to  cal¬ 
culate  1  tests  of  the  null  hypothesis  that  the 


regression  weights  are  zero  in  the  population 
The  pattern  of  results  clearly  differentiated 
the  linear  relationships  of  ability  factor  scores 
to  text  memory  performance  from  the  Age  x 
Intelligence  interaction  effects  The  general 
intellectual  factor,  g,  provided  the  best  inde¬ 
pendent  prediction  of  text  memory  perfor¬ 
mance,  but  did  not  produce  an  interaction 
effect  in  conjunction  with  age  at  any  delay 
level.  The  Verbal  Comprehension  and  Asso¬ 
ciative  Memory  factors  also  provided  inde¬ 
pendent  jr'riiction  of  text  memory  perfor¬ 
mance,  although  at  a  much  smaller  level  of 
magnitude  Verbal  Productive  Thinking  did 
not  provide  statistically  reliable  independent 
prediction  of  text  memory  performance.  Of 
course,  the  relatively  small  independent  con¬ 
tributions  of  the  ability  factors  other  than  g 
are  in  part  a  function  of  mutual  inhibition, 
considering  the  high  intercorrelation  between 
Verbal  Comprehension  and  Verbal  Productive 
Thinking 

In  contrast  to  the  simple  linear  ability 
effects,  Verbal  Productive  Thinking  contrib¬ 
uted  most  to  the  significant  overall  interaction 
at  immediate  recall,  interacting  with  both  agr 
contrasts.  The  direction  of  effects  was  in  the 
predicted  direction,  with  age  differences  be¬ 
tween  all  three  groups  were  smaller  at  higher 
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Table  6 


Standardized  Regression  Parameters  for  Three  Delay  Conditions  on  Intelligence  Factors  and  Age 


Source 

Delay  condition  (fi) 

Immediate 

1  Week 

4  Weeks 

VC 

.23  (.nr* 

.26  (.ID- 

.18  (.11)— 

VPT 

.05  (.10) 

.00  (.09) 

.10  (.10) 

AM 

.14  (.07 r* 

.13  (.07)* 

.13  (.07)* 

i 

J9  (.09)*— 

.38  (.09)— 

.37  (.09)— 

AGE1 

.18  (.09)* 

27  (.09)— 

.24  (.09)— 

AGE2 

.24  (.11)” 

.34  (.11)— 

•37  (.ID— 

VC  x  AGEI 

.16  (.11) 

.08  (.11) 

.07  (.11) 

VPT  x  AGEI 

-.29  (.10)*** 

-.10  (.10) 

-.15  (.10) 

AM  x  AGEl 

.02  (.07) 

.04  (.07) 

.10  (.07) 

g  X  AGEI 

.08  (.08) 

.13  (.08) 

.12  (.081 

VC  X  AGE2 

.29  (.10)*** 

.28  (.10)—* 

.12  (.10)— 

vpt  x  age: 

-2$  (.09)—" 

-.13  (.09) 

-.03  (.09i 

am  x  age: 

-.16  (.07)- 

-.11  (.07) 

-.15  (.07)— 

g  x  age: 

.02  (.08) 

0.0  (.08) 

-.01  (.08) 

AGE  1  (alone) 

.37  (.07)*— 

.40  (.07)— 

.43  (.07)— 

AGE  2  (alone) 

.35  (.07)— 

.39  (.07)— 

.37  (,0'i— 

Sole  VC  •  Verbal  Comprehension.  VPT  »  Verbal  Productive  Thinking.  AM  -  Associative  Memory,  g  -  General 
Intelligence,  AGE  1  -  first  age  contrast  (middle  aged  vs.  old).  AGE2  -  second  age  contrast  (young  w  middle  aged 
old) 

*  Standard  errors  art  in  parentheses. 

Significance  levels  for  i  test  of  Ho' d  *  0  are  denoted  as  follows  *  p  <  .10.  **  p  <  .05  —  p  <  .01.  p  <  .001 


levels  of  fluency.  The  other  significant  inter¬ 
action  terms  involved  only  the  contrast  be¬ 
tween  the  young  group  and  the  two  older 
groups  for  both  Verbal  Comprehension  and 
Associative  Memory.  The  pattern  of  interac¬ 
tion  found  for  Associative  Memory  was  sim¬ 
ilar  to  Verbal  Productive  Thinking  However, 
to  our  surprise  we  discovered  that  the  pattern 
was  actually  reversed  for  Verbal  Comprehen¬ 
sion — age  differences  appeared  to  be  greater 
at  the  higher  levels  of  verbal  ability!  We 
verified  the  direction  of  this  relationship 
through  examination  of  a  bivariate  scatterplot 
and  a  regression  using  the  original  vocabulary 
subtests.  This  further  analysis  indicated  that 
the  strength  of  the  effect  was  in  part  a  function 
of  classic  suppression;  the  high  positive  cor¬ 
relation  between  Verbal  Comprehension  and 
Verbal  Productive  Thinking,  combined  with 
the  opposite  directions  of  interactions  of  each 
of  the  two  ability  factors  with  age,  helped 
produce  the  statistically  reliable  positive 
regression  coefficient  for  the  Age  X  Verbal 
Comprehension  interaction.  Nevertheless,  the 
direction  of  the  effect,  even  in  the  bivariate 
plots,  definitely  showed  increasing  age  differ¬ 
ences  at  the  highest  levels  of  Verbal  Compre¬ 
hension. 


This  Age  X  Verbal  Comprehension  inter¬ 
action  term  was  statistically  reliable  at  all 
delay  levels.  In  contrast,  the  Age  X  Verbal 
Productive  Thinking  and  Age  X  Associative 
Memory  interactions  were  not  significant  at 
the  longer  delay  intervals.  Indeed,  considering 
all  the  interaction  terms  together,  it  is  clear 
that  the  interaction  effects  are  at  best  small 
in  magnitude  and  should  not  be  given  great 
interpretive  weight4  However,  when  one  con¬ 
siders  that  the  reliable  interaction  effect  for 
Verbal  Comprehension  was  the  reverse  of  the 
predicted  relationship,  it  appears  safe  to  con¬ 
clude  that  the  hypothesis  of  reduced  age 
differences  at  higher  ability  levels  was  not 
supported  by  the  data. 


‘la  (act.  it  is  possible  to  reduce  the  size  of  the 
interaction  term  for  Verbal  Productive  Thinking  by 
changing  the  factor  model  specification  The  interaction 
is  most  strongly  present  (at  the  level  of  single  subtexts) 
for  Controlled  Associates;  routing  the  factor  towards  tbc 
ideational  Fluency  subtests  pushes  the  Age  x  Verbal  Pro¬ 
ductive  Thinking  interactions  below  significance.  Because 
the  model  u—h  in  this  analysis  is  more  parsimonious  as 
t  representation  of  the  spedfic  verbal  (acton,  we  have 
it  ported  its  results  alone.  Nevertheless,  the  evidence  for 
interaction  effects  indicates  that  any  effects  in  this  pop¬ 
ulation  are  relatively  small  in  magnitude. 


F-ll 


1204 


D  HULTSCH,  C  HERTZOG.  AND  R_  DD<ON 


Intelligence-Text  Recall  Relationships 
Correlational  Analyses 

The  correlational  analyses  using  usrel 
were  designed  to  explicate  the  relationship 
between  text  recall  performance  and  psycho¬ 
metric  intelligence  in  the  three  age  groups. 
Our  interest  was  in  determining  whether 
group  differences  in  correlations  among  sub¬ 
test  scores  and  text  memory  (not  reported 
here)  reflected  differential  relationships  of 
text  recall  with  underlying  dimensions  of 
intelligence  for  the  three  age  groups. 

In  order  to  examine  the  text  recall  corre¬ 
lations  with  intelligence,  we  introduced  Text 
Recall  as  an  additional  factor  in  the  factor 
model  of  the  intelligence  subtests.  This  mode! 
allows  us  to  represent  the  covariances  between 
the  text  recall  variables  and  intelligence  sub¬ 
tests  as  being  mediated  through  the  covari¬ 
ances  between  the  text  recall  and  intelligence 
factors,  which  were  modeled  in  ♦.  We  tested 
the  ability /text  recall  relationships  with  the 
immediate  recall  data  The  results  were  then 
replicated  at  the  two  longer  delay  intervals. 

A  first  model  forcing  all  four  con  variances 
between  Text  Recall  and  the  four  intelligence 
factors  to  equal  zero  provided  a  poor  fit  to 
the  data.  xJ(321.  A'  =  143)  *  469.19.  p  < 
.001  An  alternative  model  allowing  the  co¬ 
variances  to  be  freely  estimated  fit  consider¬ 
ably  better,  x:(309.  .V  =  143)  =  386.  69,  p  < 
.01.  The  difference  in  x2  tested  the  (multivar¬ 
iate)  null  hypothesis  of  zero  correlations  be¬ 
tween  Text  Recall  and  the  intelligence  factory 
This  hypothesis  was  rejected,  x2  *  82.50  (12, 
A'  =  143)  p  <  .001  We  also  tested  the  null 
hypothesis  of  group  equivalence  in  the  text 
recall-intelligence  correlauons  by  introducing 
a  scaling  vector  in  the  model  (thus  allowing 
for  group  differences  in  variances)  and  coo- 
straining  the  scaled  Text  Recall  ability  co¬ 
variances  to  be  equal  for  the  three  age  groups. 
This  mode!  produced  a  significant  increase 
in  x:  (20.99  with  8  df,  S  -  143  p  <  .01). 
The  multivariate  null  hypothesis  of  equal 
correlations  between  age  groups  was  therefore 
rejected. 

Table  7  reports  the  ♦  matrices  for  the 
three  groups,  including  the  rescaled  correla¬ 
tions  between  Text  Recall  and  the  four  intel¬ 
ligence  factors.  The  group  differences  in  the 
text  recall -intelligence  correlations  form  an 


interesting  pattern.  In  the  young  and  middir 
*ged  groups,  there  is  a  statistically  rehab:-, 
correlation  between  g  and  Text  Recall  (.5* 
and  .52,  respectively).  There  is  also  a  statis¬ 
tically  reliable  correlation  between  Verba. 
Comprehension  and  Text  Recall  m  the  your, 
group  (r  -  .38).  However,  this  correlation  wa.- 
only  .22  in  the  middle-aged  group,  less  that 
the  .31  correlation  between  Text  Recall  ar.: 
Associative  Memory.  The  correlational  parte— 
in  the  old  group  is  completely  divergent  Fo 
the  old  adults,  the  correlation  between  g  ar : 
Text  Recall  was  not  statistically  reliable  ti¬ 
the  remaining  correlations  between  Tex:  R: 
call  and  the  other  intelligence  factors  wer. 
statistically  significant.  Indeed,  the  correlate: 
between  Verbal  Productive  Thiniang  and  Te^ 
Recall  was  .86,  which  was  unexpectedly  tug:. 
Given  the  definition  of  the  other  lnteLiigerr: 
factors  as  being  orthogonal  to  g.  the  resell 
in  the  old  group  indicate  that,  in  spite  of  Lb: 
higher  magnitude  of  the  simple  correlation, 
among  all  the  intelligence  subtexts,  Text  Reed 
performance  was  more  highly  correlated  wrr 
the  specific  factors  related  to  verbal  in  tell 
gence  and  memory  than  to  general  intelli¬ 
gence.  This  was  not  the  case  in  the  young 
and  middle-aged  groups.  As  can  be  seen  ir. 
Table  7,  the  pattern  of  differential  correla tic:. 
berweeD  age  groups  replicated  at  the  longr- 
delay  intervals. 

We  also  assessed  the  hypothesis  that  th; 
lower  levels  and  greater  variability  of  years  cf 
education  in  the  old  group  produced  th: 
differences  in  text  recall-intelligence  correla¬ 
tions.  This  was  accomplished  by  partiality 
years  of  education  from  the  factor  correlation: 
and  examining  the  residual  correlations 
These  residual  correlations  were  highly  similar 
to  the  original  correlauons,  ruling  out  grout 
differences  in  yean  of  education  as  the  deter¬ 
minant  of  age  differences  in  text  recall-inie'- 
ligence  correlauons. 

Discussion 

The  present  data  indicate  that  there  are 
substantial  age-related  differences  id  th: 
amount  of  information  recalled  from  mean 
ingful  texts.  These  results  are  consistent  »rr 
those  of  other  studies  that  have  examined  the 
texl  recall  performance  of  adults  with  reia 
lively  modest  levels  of  education  (Coher. 
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1979;  Dixon  et  al.,  1982;  Zelinski  et  al., 
1980).  The  present  data  also  indicate  that 
there  is  little  evidence  for  age-related  differ¬ 
ences  in  the  rate  at  which  information  about 
meaningful  text  is  forgotten.  These  results 
are  also  consistent  with  most  previous  studies 
(e  g.,  Dixon  et  al.,  1982;  Gordon  &  Gark, 
1974).  Within  this  context,  however,  the  pres¬ 


ent  study  suggests  some  important  conclusions 
about  the  role  of  intellectual  ability  factors 
in  age-related  differences  in  text  recall  per¬ 
formance. 

It  has  been  repeatedly  suggested  that  adult 
age  differences  in  text  performance  may  de¬ 
pend  on  the  subjects’  level  of  verbal  ability 
because  several  studies  have  demonstrated 


Table  7 

Text  Memory/Inieltigence  Correlations  for  Within  Age  Groups  for  Three  Delay  Conditions 


Intelligence  factors 


Age 

Parameter 

g 

VC 

VPT 

AM 

Immediate 

Young 

a* 

8.91 

3  68 

4.93 

0  94 

36.3!**  01.8') 

15.36*  (7  65) 

-14.10 

(11.0) 

3.19 

(3  18) 

'TV* 

.58 

.38 

-.30 

.15 

Middle  aged 

*K3 

16.19 

7.33 

12.00 

0  '9 

Q  TM 

54  69**  (19. S’) 

1609  (12.33) 

12.25 

(16  79) 

7.30 

(4  3') 

/’tm 

.52 

22 

.13 

.31 

Old 

10.56 

9.69 

6  81 

9: 

9  84  (9041 

33.93-  (1 1.98) 

43.29— 

(12  12: 

9  55* 

14.19) 

r-m 

.16 

.57 

.86 

.52 

1  week 

Young 

9  99 

4  03 

5.0' 

1  01 

*TM 

21.41**  (7.96) 

14  65-  (5  61) 

*•«.  i 

r  53) 

3  33 

C.36 

rTM 

.4' 

.50 

-O' 

.23 

Middle  aged 

17.35 

7.29 

12  35 

0  85 

*TX 

39  '0"  (13.24) 

8.61  (7.39) 

9.28 

(1042) 

4  30 

(2  69. 

fTM 

.56 

.19 

.16 

.28 

Old 

*•0 ! 

11.3! 

963 

7.15 

0  96 

<*TM 

6.16  (4.00) 

9.92-  (4.28) 

9  06* 

(4  03) 

3  59* 

(1.59 

.26 

.45 

48 

.51 

4  weeks 

Young 

«>o! 

1040 

4.22 

4  54 

1.00 

16.05*  (7.07) 

1162*  (4.94) 

8  46 

(6.46) 

1  24 

(1.94) 

rTX 

.39 

.45 

.32 

.10 

Middle  aged 

»K>J 

17.64 

7.56 

9.67 

0  84 

33.91-  (1  1.28) 

6.90  (6.30) 

4.63 

(8.02) 

4.52 

(2  41) 

fTM 

.57 

.18 

.11 

.35 

Old 

•k 

11.41 

10.05 

5.30 

095 

*TU 

4.30  (3.00) 

8.75-  (3.49) 

1061- 

(3.39) 

2.48* 

(117) 

rTU 

21 

.49 

.82 

.45 

Note  g  •  General  Intelligence;  VC  •  Verbal  Comprebenson,  VPT  *  Verbal  Productive  Thinking.  AM  -  Associative 
Memory. 

*  Variance  of  intelligence  factor 

k  Covariance  of  intelligence  with  text  memory  (standard  error  in  parentheses). 

'  Correlation  of  intelligence  with  text  memory. 

Significance  levels  for  H«:  «tm  “  0  denoted  as  follow*  •  p  <  .Os.  —  t  <  -01.  ***  p  <  .001 
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that  age  differences  are  present  when  subjects 
of  low  ;o  medium  verbal  ability  are  examined 
and  absent  when  subjects  of  high  verbal 
ability  are  examined  (Dixon,  et  al.,  in  press; 
Meyer  St  Rice,  1983;  Taub,  1979).  However, 
the  results  of  the  present  analysis  suggest  that 
the  potential  contribution  of  ability  factors 
to  age-related  differences  in  text  recall  per¬ 
formance  is  more  complex  than  previous 
reports  might  indicate. 

First,  it  is  apparent  that  abilities  other  than 
Verbal  Comprehension  are  predictive  of  text 
recall  performance.  In  particular,  the  present 
results  suggest  that  general  intelligence.  Verbal 
Productive  Thinking,  and  Associative  Mem¬ 
ory  also  correlate  with  individual  differences 
in  text  recall  performance.  In  fact,  the  ability 
with  the  largest  overall  relationship  with  text 
memory  performance  in  the  single  group 
analysis  was  g,  not  Verbal  Comprehension. 
Second,  the  present  results  show  that  age 
differences  in  text  memory  performance  co¬ 
vary  highly  with  age  differences  in  intellectual 
abilities.  The  regression  analyses  indicated 
that  age  differences  in  text  memory  perfor¬ 
mance  are  drastically  reduced,  but  not  elim¬ 
inated,  when  pardaled  for  intellectual  ability. 
Third,  and  perhaps  most  significantly,  the 
present  results  do  not  support  the  notion  that 
there  is  an  Age  X  Verbal  Comprehension  in¬ 
teraction  across  the  range  of  verbal  abilities 
such  that  age  differences  are  progressively 
reduced  with  higher  ability  level,  as  might  be 
suggested  by  the  results  cited  above.  If  any¬ 
thing.  we  found  evidence  for  larger  age  dif¬ 
ferences  at  the  highest  Verbal  Comprehension 
levels  present  in  our  sample.  The  type  of 
interaction  predicted  by  the  previous  work 
with  extreme  groups  designs  was  only  found 
in  the  immediate  recall  condition  for  Verbal 
Productive  Thinking  and  Associative  Mem¬ 
ory;  moreover,  the  small  magnitude  of  the 
interaction  effects  and  the  transience  of  the 
relationship  with  respect  to  delay  interval 
suggests,  at  minimum,  that  such  interactions 
should  be  interpreted  conservatively. 

The  present  results  need  not  be  viewed  as 
contradictory  to  previous  findings  if  we  allow 
for  the  fact  that  the  population  studied  here 
is  a  community  population  that  apparently 
contains  small  proportions  of  the  extreme 
high  ability/highly  educated  elderly.  It  may 
well  be  the  case  that  age  differences  are 
smaller  only  at  the  highest  ability  or  educa¬ 


tional  levels,  or  alternatively,  that  there  is  a 
small,  relatively  intact  subpopuiation  of  able 
elderly  who  show  little  decline  in  text  memory 
performance.  Comparisons  of  such  a  select 
subpopulation  with  young  adult  groups  might 
well  yield  little  age  differences.  Nevertheless 
the  prresent  results  spoak  to  the  issue  of  the 
generality  of  the  results  from  the  previous 
extreme  groups  comparisons.  For  the  ability 
ranges  studied  here,  the  interaction  effects  do 
not  suggest  the  elimination  of  the  age  differ¬ 
ences  at  higher  ability  levels. 

The  final  complexity  in  ability-text  mem¬ 
ory  relationship  discovered  in  the  preset, 
study  is  the  shift  in  patterns  of  within-group 
correlations  between  text  memory  and  intel¬ 
lectual  ability  factors  across  the  three  age 
groupis.  In  the  case  of  the  young  and  middle- 
aged  adults,  the  largest  correlations  of  text 
memory  and  ability  factors  occurred  with  g 
and  Verbal  Comprehension.  However  in  the 
case  of  the  old  adults,  the  largest  correlations 
involved  Verbal  Productive  Thinking.  Verba.' 
Comprehension,  and  Associative  Memory 
General  intelligence  is  of  little  value  in  pre¬ 
dicting  text  recall  performance  in  the  elderly 
Thus,  with  increasing  age.  text  recall  perfor¬ 
mance  is  increasingly  related  to  spacific  in¬ 
tellectual  abilities  including  Verbal  Productive 
Thinking  and  Associative  Memory  as  well  as 
Verbal  Comprehension. 

The  reduced  correlation  between  g  and 
text  memory  performance  in  the  old  group 
1  rather  surprising  One  of  the  consistently 
r.pLicated  findings  in  the  literature  on  adul'. 
age  differences  in  the  factor  structure  of 
psychometric  intelligence  is  that  oldeT  popu¬ 
lations  have  a  less-differentiated  factor  struc¬ 
ture  than  younger  pxipuladons.  usually  man¬ 
ifested  in  a  higher  correlation  among  primal 
ability  factors  (e.g,  Baltes,  Cornelius,  Spire 
Nesselroade,  St  Willis,  1980;  Cunningham 
1980).  A  developmental  hypothesis  that  has 
derived  from  this  pattern  is  that  of  reintegra¬ 
tion  or  de-differentiation  of  intelligence  wuh 
aging  such  that  individual  differences  in  cog 
nitive  activity  are  determined  less  by  specific 
drill*  (as  represented  by  the  range  of  primary 
intellectual  abilities)  and  more  by  general 
cognitive  efficiency  (Reineri,  1970).  Some 
researchers  (e.g,  Birren,  Woods,  St  Williams 
1979)  have  drawn  a  parallel  to  other  studie< 
suggesting  a  general  slowing  of  cognitive  spaed 
with  aging  and  have  interpreted  de-differen- 
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tiation  of  intelligence  as  an  indication  of  the  evidence  for  such  differences.  For  example, 
predominant  importance  of  central  nervous  Rice  and  Meyer  (1983)  found  younger  and 
system  integrity  in  determining  individual  middle-aged  adults  are  more  likely  than  older 
differences  in  older  populations.  If  this  inter-  adults  to  use  a  strategy  that  emphasized  serial 
pretation  were  taken  to  its  logical  extreme,  retrieval  of  information  based  on  an  under- 
we  would  predict  that  general  intelligence  standing  of  the  paragraph  structure  of  the 
should  have  a  higher  correlation  with  text  texL  In  contrast,  older  adults  were  more 
recall  performance  in  the  elderly  than  in  any  likely  than  younger  and  middle-aged  adults 
other  population,  yet  the  pattern  of  effects  in  to  rely  on  a  simpler  strategy  that  emphasized 
this  study  is  in  the  opposite  direction.  Ap-  the  identification  of  the  main  ideas  of  the 
parently  not  all  forms  of  cognitive  activity  texL  To  the  extent  that  different  intellectual 
increase  their  correlation  with  g  over  the  abilities  support  such  different  strategies,  a 
adult  life  span.  changing  partem  of  correlations  as  a  function 

We  should  note,  however,  that  text  recall  of  age  would  be  produced.  In  this  instance, 
did  correlate  significantly  with  the  other  three  then,  abilities  are  functioning  as  indirect 
intelligence  factors  (which  in  turn  were  highly  markers  of  strategy  use. 
intercorrelated).  This  pattern  of  effects  might  A  second  explanation  of  the  shift  in  cor- 
be  taken  to  indicate  that  a  second-order  verbal  relational  patterns  involves  the  concept  of 
intelligence  factor,  uncorrelated  with  g,  cor-  differential  loss  of  abilities  that  relate  to  text 
relates  with  text  recall  in  the  old  group.  This  memory  performance.  From  this  perspective, 
shift  in  correlations  is  provocative,  but  some  most  young  persons  would  have  sufficient 
caution  is  in  order  given  the  relatively  small  semantic  processing  skills  and  memory  for 
sample  sizes.  Certainly  replication  of  these  words  to  perform  adequately  on  text  compre- 
differences  in  larger  samples  would  be  a  hension  and  recall  tasks.  Thus,  individual 
necessary  part  of  any  attempt  to  extend  and  differences  in  text  memory  performance 
explain  these  findings.  We  note,  however,  that  would  not  be  predicted  by  individual  differ- 
Hultsch  et  aL  (1976)  found  higher  correlations  ences  in  intellectual  abilities.  In  older  popu- 
be tween  psychometric  tests  of  Associative  latioos,  on  the  other  hand,  it  is  possible  that 
Memory  and  learning  performance  in  an  a  subgroup  of  older  persons  would  have  suf- 
older  sample  than  in  a  young  group  of  sub-  fered  a  sufficient  level  of  decline  in  their 
jects.  Although  the  experimental  tasks  were  semantic  processing  skills  to  cause  declines 
not  particularly  comparable  between  the  two  in  text  recall  performance,  whereas  other 
studies,  the  similar  shift  in  correlations  lends  oldeT  persons  would  have  maintained  their 
additional  validity  to  the  present  results.  skills.  Such  a  pattern  would  increase  the 
We  are  inclined  to  view  the  shift  in  corre-  predictive  value  of  individual  differences  in 
lational  pattern  as  a  developmental  pbenom-  associative  memory  and  other  semantic  pro- 
enon  meriting  further  study.  However,  one  cessing  skills  for  text  recall  performance  in 
could  also  argue  that  the  group  differences  the  older  groups  because  the  range  of  individ- 
might  have  been  produced  artifactually  by  ual  differences  in  semantic  processing  skills 
differential  selection.  For  example,  group  dif-  would  include  levels  that  would  have  an 
ferences  in  text  recall-intelligent  correlations  adverse  impact  on  performance  on  text  recall 
could  be  a  function  of  group  differences  in  tasks. 

variables  such  as  education.  We  found  no  This  interpretation  is  consistent  with  find- 
indication  that  educational  differences  ac-  ings  from  the  psychometric  literature  con- 
count  for  the  shift  in  correlational  patterns,  ceroing  the  terminal  decline  phenomenon 
but  we  cannot  rule  out  other  types  of  selection  (Riegel  &  Riegel,  1972).  It  is  well  known 
effects.  that,  on  average,  older  persons  are  more  likely 

Assuming  that  the  increased  correlations  to  decline  in  primary  abilities  related  to  fluid 
actually  do  reflect  some  type  of  developmental  intelligence,  spatial  visualization,  or  percep- 
phenomenon,  bow  might  it  be  characterized?  tual  speed,  but  are  likely  to  maintain  levels 
Of  the  several  possibilities,  let  us  mention  of  crystallized  intelligence,  including  numer- 
two.  The  first  is  that  the  results  may  be  a  ical  and  verbal  abilities  (see  Horn  &  Donald- 
fun  cti  on  of  age-related  differences  in  strategies  son,  1980).  However,  the  literature  on  non- 
used  to  process  the  texts.  There  is  recent  normative  pathological  decline  prior  to  death 
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shows  that  the  decline  does  not  spare  verbal 
skills  Indeed,  the  phenomenon  of  terminal 
decline  is  best  identified  by  the  fact  that 
vocabulary  and  knowledge-ohented  tests, 
which  normally  remain  relatively  stable,  de¬ 
cline  (e.g.,  Blum  4  Jarvik,  1974).  From  the 
differential  loss  perspective,  one  would  argue 
that  declines  in  text  recall  performance  are 
relatively  nonnormative,  in  the  sense  that 
they  cannot  be  expected  for  all  (or  perhaps 
even  a  majority  of)  elderly  individuals.  Instead 
only  some  individuals  in  the  older  population 
exhibit  a  sufficiently  large  decline  in  semantic 
processing  skills  to  adversely  affect  text  recall 
performance.  Such  a  phenomenon  could  ac¬ 
count  for  (a)  the  shift  in  the  correlational 
pattern  of  intelligence  and  text  recall  over 
different  age  groups,  (b)  the  inconsistency  in 
the  literature  of  studies  finding  age  differences 
in  text  memory  performance,  because  finding 
mean  differences  would  depend  on  the  relative 
proportion  of  the  declining  elderly  subpopu¬ 
lation  sampled;  and  (c)  the  differential  prob¬ 
ability  of  finding  age  differences  in  text  mem¬ 
ory  among  groups  partitioned  by  high  and 
low  verbal  ability. 

Finally,  some  combination  of  these  expla¬ 
nations  is  possible.  For  example,  differential 
decline  may  be  the  source  of  age-related 
differences  in  encoding  or  retrieval  strategies. 
Such  a  possibility  is  consistent  with  recent 
findings  reported  by  Spilich  (1983).  He  found 
evidence  of  poorer  text  performance  in  “nor¬ 
mal"  elderly  compared  to  younger  adults, 
but  not  qualitative  age  differences  in  text 
processing  strategies.  In  contrast,  he  found 
evidence  for  such  qualitative  differences  be¬ 
tween  the  “normal"  elderly  and  memory- 
impaired  elderly. 

Thus,  poor  text  recall  performance  in  later 
life  may  reflect  two  different  phenomena  that 
are  hopelessly  confounded  in  a  cross-sectional 
design;  First  low-ability  subjects  whose  poor 
text  performance  reflects  the  continuation  of 
poor  verbal  skills  over  the  life  span,  and 
second,  low-ability  subjects  whose  poor  text 
performance  reflects  a  loss  of  verbal  skills 
from  previously  higher  levels.  Qearly,  a  short¬ 
term  longitudinal  study  examining  changes 
in  intellectual  abilities  and  text  recall  peform- 
ance  in  middle-aged  and  elderly  adults  would 
be  required  to  examine  these  possibilities. 

In  summary,  the  present  study  dearly  sug¬ 
gests  that  (a)  text  recall  performance  in  adult¬ 


hood  is  predicted  not  only  by  Verbal  Com¬ 
prehension,  but  by  multiple  abilities;  (b)  that 
age  differences  in  text  memory  performance 
overlap  highly  with  age  differences  in  multiple 
intellectual  abilities,  although  ability  differ¬ 
ences  do  not  fully  account  for  the  age  differ¬ 
ences  in  text  recall;  (c)  modest  Age  x  Intel¬ 
lectual  ability  interactions  may  exist,  but  the 
pattern  of  Age  x  Ability  interactions  does  no: 
suggest  decreasing  age  differences  in  text  recall 
with  increasing  ability  across  the  range  of  the 
ability  distribution;  and  (d)  that  there  are 
differences  in  the  pattern  of  within-age-group 
intelligence-text  recall  performance  correla¬ 
tions.  The  results  may  well  be  problematic 
for  a  representation  of  text  recall  performance 
declines  as  simply  quantitative  changes  in  an 
otherwise  qualitatively  invariant  cognitive 
process.  They  also  suggest  that  cognitive  psy¬ 
chologists  should  carefulJv  examine  the  se¬ 
mantic  processing  factors  associated  with  text 
recall  performance,  keeping  in  mind  that 
accounting  for  individual  differences  in  de¬ 
cline  functions  may  be  the  critical  feature 
needed  to  solve  the  problem 
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